ournal of Applied eee Se 


| Edited by. 


Joun G. DARLEY 


ý 
University of Minnesota 


Lorraine BourHiLET, Managing Editor 


Consulting Editors 


arold E, Burtt, Ohio State University, . Alexander Mintz, City College of New York 
Iphonse Chapanis, Johns Hopkins Univer- Harold F. Rothe, Fairbanks, Morse and 
sity Company 
lite E, Jurgensen, Mianet Gas i Julian B. Rotter, Ohio State University 
‘ompeny i Re 
Donald E. S Col: 
Laurence S. McGaughran, Usihoersity of ve or pie ‘ TEE olumhga Ce 
Houston Miles A. Tinker, University of Minnesota 


uinn McNemar, Stanford University Alfred C. Welch, University of New Mexico 


Published bimonthly by the American Psychological Association, Inc. 


Prince and Lemon Sts., Lancaster, Pa. and 1333 16th St. N.W., 
Washington 6, D. C. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 


p Acceptance for mailin at the: special rovided = ia paragraph (d-2), Section 34.40, 
atthe pec te AP autborized Octo j à 


i EA © 1956 by the American sano suas! Inc. 


Contents of Volume 40 


Aalto, B. P. A Scale Measuring Attitudes Toward Working for the Government.. 398 
Anderson, N. H., Grant, D. A., and Nystrom, C. O. The Influence of Spatial 
Positioning of Stimulus and Response Components on Performance of a Re- 
i petitive Key-Pressing Task. ...........nessrrrrerrrrrrrrrerrrrenrtrerrtett 137 
Averbach, E. See Lincoln, R. S. 
Aylward, M. See Dunnette, M. D. 
Bair, J. T., Lockman, R. F., and Martoccia, C. T. Validity and Factor Analysis of 


Naval Air Training Predictor and Criterion Measures......................-. 213 
' Baker, C. A., and Vanderplas, J. M. Speed and Accuracy of Scale Reading as a 
Function of the Number of Reference Markers..................000-e00e008 307 


Barry, J. R. See Sells, S. B. 
Barthol, R. P., and Kirk, B. A. The Selection of Graduate Students in Public 


OR FOGUCATION ¢ nc tie sie ses cue chain Sousa pura ieee aa ssn aaa eT eta eae 159 
Barthol, R. P., and Zeigler, M. Evaluation of a Supervisory Training Program with 
SOW SUPCTVISE? s. os a a aep sia a Si otic d AAE o's Emme y Cia ikea GIAA 403 
Barthol, R. See Ghiselli, E. E. 
Bass, B. M. Development of a Structured Disguised Personality Test........... 393 
Bass, B. M. Leadership Opinions as Forecasts of Supervisory Success. . RRE.” 5 
Bass, B. M., and Wurster, C. R. Using “Mark Sense” for Ratings and Personal 
Data Collection on) Mace ote E NEE AAE y OE AAE EE a cot A 269 


Bennett, E. M. See Hall, N. B., Jr. 
Benson, P. H., and Platten, J. H., Jr. Preference Measurement by the Methods 
of Successive Intervals and Monetary Estimates... ........ nunus urnor rreran 412 
Birmingham, H. P. See Chernikoff, R. 
Bloom, R. See Smith, K. U. 
Boneau, C. A. See Weaver, H. B. 
Bridgman, C. S., and Wade, E. A. Optimum Letter Size for a Given Display Area. 378 
Browne, C. G., ahd Shore, R. P. Leadership and Predictive Abstracting......... 112 
Burwen, L. S., Campbell, D. T., and Kidd, J. The Use of a Sentence Completion 
Test in’ Measuring Attitudes "Toward Superiors and Subordinates............. 248, 
Campbell, D. T. See Burwen, L. S. 
Chernikoff, R., Birmingham, H. P., and Taylor, F. V. A Comparison of Pursuit and 


Compensatory Tracking in a Simulated Aircraft Control Loop................. 47 
~ Chinn, H. I. See Sells, S. B. 

= Churchill, A. V. Comparison of Two Visual Display Presentations.............. 135 

Churchill, A. V. The Effect of Scale Interval Length and Pointer Clearance on 
Speed and Accuracy of Interpolation. s. as aaO e ccc e eee Uae aa AAA nee 358 

Clark, K. E., and Jones, R. L. Changes in Attitudes Toward a Low-Rent Housing 
oF a AAEN oe E TI a cel 25 hi ed RE oR SIE 201 

Cleven, W. A., and Fiedler, F. E. Interpersonal Perceptions of Open-Hearth 
Foremen and Steel Production isc cada tis Mais ices, Go a ed 312 
Cuadra, C. A. A New Technique for Rapid Item Analysis....................05 187 


Danielson, L. E. See Maier, N. R. F. 
Decker, R. L. An Item Analysis of How Supervise? Using Both Internal and © 
external Criteria... oo... i eaaa Tane + sees aaa unos BOUDA 406 
Dunnette, M. D., and Heneman, H. G., Jr. Influence of Scale Administrator on 
. Employee Attitude Responses............ S A vfs soe ASEET E E E ee 73 


iv Contents of Volume 40 k 


Dunnett, M. D., Uphoff, W. H., and Aylward, M. The Effect of Lack of Information 
on the Undecided Response in Attitude Sufveys: i Sie T; 
Dvorak, B. J. GATB in Foreign Countries: «Sees: ee a a 
Edwards, A. L. A Technique for Increasing the Reproducibility of Cumulative 
Attitude SCOR e e ul e eek 249 sass), a 
Evans, R. I. An Examination of Students’ Attitudes Toward Television as a 


Ewens, W. P. The Development and Standardization of a Preliminary Form of an 
Activity Experience Inventory: A Measure of Manifest Interest... E 


Fleishman, E. A., and Hempel, W. E., Jr. Factorial Analysis of Complex Psycho- 
motor Performance and Related Skills...................................., 
Foley, P. J. Evaluation of Angular Digits and Comparisons with a Conventional Set. 
Frick, J. W., and Keener, H. E. A Validation Study of the Prediction of College 
Achievement se agi woes Cee alk ee 
Ganguli, H. C. Attitudes of Union and Non-Union Employees in a Calcutta Elec- 


Supervisors............. 9 nin AR A a E es felch ais Fo eles ss «es . » «a 
Glickman, A. S. The Naval Knowledzelest. ani... AEREN Wis... 
Gold, R. A. See Smith, P. C. 

Gordon, L. V., and Stapleton, E. S. Fakability of a Forced-Choice Personality Test 

Under Realistic High School Employment Conditions........................ 
Graham, N. E. The Speed and Accuracy of Reading Horizontal, Vertical, and 

Circular Scales.................. 
Grant, D. A. See Anderson, N. H. 
Green, D, See Hecker, D. i 
Guthrie, M. R. The Measurement of Personal Factors Related to Success of Office 

Workerd ogsaa Laie alani RR 


Dy SejandeKeys. 5 ee, 


Hempel, W. E., Jr. See Fleishman, E. A. 

Heneman, H. G., Jr. See Dunnette, M. D. 

Henson, J. B. See Holland, J.G. 

Heron, A. The Effects of Real-Life Motivation on Questionnaire Response....... 
Hewer, V. H. A Comparison of Successful and Unsuccessful Students in the 


Hollander, E. P. See Webb, W. B. 
Holman, P. A. Validation of an Attitude Scale as a Device for Predicting Behavior. 


263 
32m 


169 


347 


ra 


y Contents of Volume 40 v 
_Auiskamp, J., Smader, R. C., and Smith, K. U. Dimensional Analysis of Motion: 
IX. Comparison of Visual and Nonvisual Control of Component Movements... 181 
Isard, E. S. The Relationship Between Item Ambiguity and Discriminating Power 
invasForced-Choice Seen Ean e E V N A E 266 
Jenkins, W. L. Triserial r—A Neglected Statistic. ...............--.ss.ssseee 63 
Jenkins, W. L. See Schubert, R. G. 
Johnson, D. M., and Vidulich, R. N. Experimental Manipulation of the Halo Effect. 130 
Johnson, R. H. See Otterness, W. B. f, 
Jones, R. L. See Clark, K. E. 
Keener, H. E. See Frick, J. W. 
Kidd, J. See Burwen, L. S. ; 
Kimmel, H. D. The Relationship Between Chi Square and Size of Sample in Two- 
Cellédi TAMOS ea isc aa a oE A ai ap Esar iaa AT ATEA 61 
Kimmel, H. D. The Relationship Between Chi Square and Size of Sample: the 
(Geetie hal CAS seh seis issesy cova n niie Wetec esata tage RM RNIN caro nn: Si SONS ea oh RA ai a ae 415 
Kirk, B. A. See Barthol, R. P. 
Kurke, M. I. Evaluation of a Display Incorporating Quantitative and Check- 
Reading Characteristics...j.::j00:s:g{edsleaeek antim a = DP eh aa shoal Re 233 
Leeds, C. H. Teacher Attitudes and Temperament as Measure of Teacher-Pupil 
Rapporti: aiheut ap ars E a aise foo > IA EAA pst AM 333 
Lincoln, R. S., and Averbeck, E. Spatial Factors in Check Reading of Dial Groups. 105 
Lockman, R. F. A Note on Measuring ‘“‘Understandability”.................... 195 
Lockman, R. F. See Bair, J. T. k 
MacCaslin, E. F., and McGuigan, F. J. The Prediction of Rifle Marksmanship... 341 
Maher, H. Age of Nominator and Scores Assigned Nominees of Various Ages.... 55 
Maher, H. See Schutter, G. 
Maier, N. R. F., and Danielson, L. E. An Evaluation of Two Approaches to Disci- 
a E I A E A E siglo’ Veet ae 319 
Martoccia, C. T. See Bair, J. T. 
Mason, H. M. A Further Study of Experience-Centered and Requirements- 
Centered Tests of Job Knowledge.. ... -ocorrer nen in nania Daia idi 14 
Matyas, S. M. See Findlay, D. C. 
McCornack, R. L. Vocational Interests of Male and Female Social Workers..... 11 
McCornack, R. L. A Criticism of Studies Comparing Item-Weighting Methods... 343 
McGuigan, F. J. See MacCaslin, E. F. 
Merrill, W. J., Jr., and Bennett, C. A. The Application of Temporal Correlation 
Techniques in Psychology.........-- +. +ee essere ete eee eee eset eee tees 272 
Mintz, A. A Methodological Note on Time Intervals Between Consecutive Ac- 
(SINS OM pe Os ee Le DR, SARA 207 EAT 189 
Nahinsky, I. D. The Influence of Certain Typographical Arrangements upon Span 
of Visual Comprehension.. si. e sien rie dna lie naia iht ohh 37 
Nystrom, C. O. See Anderson, N. H. 
Otterness, W. B., Patterson, C. H., Johnson, R. H., and Peterson, L. R. Trade 
School Norms for Some Commonly Used Tests. ....... o.s asns cece een eens 57 
Parker, J. F., Jr., and Hackman, R. C. The Relationship Between Attitude Toward 
the Army and the Acceptance Accorded QM Items of Issue.................. 329 
Patterson, C. H. The Prediction of Attrition in Trade School Courses........... 154 


Patterson, C. H. See Otterness, W. B. 
Payne, R. B. See Hauty, G. T. 

Pearson, D. T., Sr. See Samuelson, C. O. 
Peterson, L.R. See Otterness, W. B. 


vi Contents of Volume 40 


Peterson, M. J. Comparison of Flesch Readability Scores with a Test of Reading 
IBOMDRERENSION® i wireless. Wy a A A DRR AER PERE aio Sve. a l 

Platten, J. H., Jr. See Benson, P. H. 

Powers, M. K. Permanence of Measured Vocational Interests of Adult Males... . 

Rogge, H., III. See Findlay, D. C. 

Rothschild, G. H. See Shaw, M. E. 

Rusmore, J. T. Fakability of the Gordon Personal Profile...................... 

Samuelson, C. O., and Pearson, D. T., Sr. Interest Scores in Identifying the Po- 
tential Trade School Dropotte sc. vices oss iaa a des ecw see. eee 

Sanua, V. D. A Note on the Spanish Language Form of the Oral Directions Test 
OL ENtEHIBO HCO asian, Says Sell ae TG ieee od te bs aden tied EE. va sedlerdiens ova. 

Schubert, R. G., and Jenkins, W. L. The Effect of Brief Training on Linear Inter- 
POA On mental AAE A esl Welds wiki E saa larstcistleaeieceads@lele es» 04.33 

Schutter, G., and Maher, H. Predicting Grade-Point Average with a Forced-Choice 
ST ASITA ORHONNAITG r n a tains sleveininye, AEE Ke cee cee 

Sells, S. B., Barry, J. R., Trites, D. K., and Chinn, H. I. A Test of the Effects of 
Pregnenolone Methyl Ether on Subjective Feelings of B-29 Crews After a Twelve- 
a Gt, PRNNCION E M AE Wack cious Rice Mis deeb RRR Pile oe eee eed 

Shaw, M. E., and Rothschild, G. H. Some Effects of Prolonged Experience in 
RTOS AE ORAM END Gro 5 vn hs ache E LM See Raia ET «os ee es 

Shephard, R. J. A Null-Point Discontinuous Electrical Pursuit Meter............ 

Shore, R. P. See Browne, C. G. 

Siegel, L. A Biographical Inventory for Students: I. Construction and Standardi- 
PACU ORC LOBEINIONE. a ASN me Raids ads cles ldietee Maa adios cece sce ee ede 

Siegel, L. A Biographical Inventory for Students: II. Validation of the Instrument. 

Simon, J. R. The Duration of Movement Components in a Repetitive Task as a 
Function of the Locus of a Perceptual Cue............ 000.00 c cece eee eeeuee 

Simon, J. R., and Smith, K. U. Theory and Analysis of Component Errors in Aided 
Pursuit Tracking in Relation to Target Speed and Sided-Tracking Time Constant. 

Sinick, D. Encouragement, Anxiety, and Test Performance..................-. 

Smader, R. C. See Huiskamp, J. 

Smith, K. U., and Bloom, R. The Electronic Handwriting Analyzer and Motion 
ity (APY RCN Tce Wk a ORM, A NEAN N ie e 

Smith, K. U. See Hecker, D. 

Smith, K. U. See Huiskamp, J. 

Smith, K. U. See Simon, J. R. 

Smith, P. C., and Gold, R. A. Prediction of Success from Examination of Perform- 
lance During the Training Period... si... . cece ee ce eee teens te cee scenes 

Smith, P. C. See Taylor, J. G. 

Smith, R. G., Jr. See Staudohar, F. T. 

Soar, R. S. Personal History Data as a Predictor of Success in Service Station 
Mansgemontien mm iste ots) cick cle eh od EO oN cs... 


Inventory spinaren tpt 28S 5 1G Ena uA epee es 
Stapleton, E. S. See Gordon, L. V. 
Staudohar, F. T., and Smith, R. G., Jr. The Contribution of Lecture Supplements 
to the Effectiveness of an Attitudinal Film... ..... oae cc ceceeeceeeseues 
Taylor, F. V. See Chernikoff, R. 
Taylor, J. G., and Smith, P. C. An Investigation of the Shape of Learning Curves for 
Industrial Motor Tasks. ... ... s. ehia a a aan Peewee. 


83 


142 


Contents of Volume 40 viii 


Taylor, W. L. ‘“Cloze” Readability Scores as Indices of Individual Differences in 
Comprehension and Aptitude... -+ -eerror 318 

Trites, D. K.. See Sells, S. B. 

Uphoff, W. H. See Dunnette, M. D. 

Vanderplas, J. M. See Baker, C. A. 

Venables, P. Car-Driving Consistency and Measures of Personality........-.-+- 21 

Vidulich, R. N. See Johnson, D. M. 

Wade, E. A. See Bridgman, C. S. 

Wallon, E. J. See Webb, W. B. 

Weaver, H. B., and Boneau, C. A. Equivalence of Forms of the Wonderlic Per- 


sonnel Test: A Study of Reliability and Interchangeability..........-+-+++5+) 127 
Webb, W. B., and Hollander, E. P. Comparison of Three Morale Measures: A 

Survey, Pooled Group Judgments, and Self Evaluations......-..-++++sss0+5 17 
Webb, W. B., and Wallen, E. J. Comprehension by Reading versus Hearing..... 237 
Weitz, J. Job Expectancy and Survival. .. -iesse e eeter oe teer erect eee nes ss 245 
Witkin, A. A. Differential Interest Patterns in Salesmen.......----++++s100es' 338 
Wurster, C. R’ See Bass, B. M. f 
Yonge, K. A. The Value of the Interview : An Orientation and a Pilot Study..... 25 


Zeigler, M. See Barthol, R. P. 
Zimmer, H. Validity of Extrapolating Nonresponse Bias from Mail Questionnaire 
Pollowatips e AAE e EEA Eep e eA E ee eee AS AA 1 


Journal of Applied Psychology 


VoL. 40, No. 2 


APRIL, 1956 


The Effects of Real-Life Motivation on Questionnaire 
Response x # 


Alastair Heron 


Medical Research Council, London 


There is abundant evidence in this Journal 
(2, 4, 8) and elsewhere (1, 5, 9, 10, 13, 14) 
that if university and college students are in- 
structed to do so, they can improve their 
score on most adjustment questionnaires and 
inventories. As far as the present writer can 
find, there is remarkably little evidence con- 
cerning what actually happens in real life. 
Green (6) published a valuable study in 
1951, quoting two other papers, one con- 
cerned like himself with a police situation 
(12), the other based on a single case his- 
tory (11). Green’s study showed that in a 
police selection situation an applicant group 
produced significantly more favorable mean 
scores than did a “secure tenure” group on 
sections S, T, D, and C of Guilford’s STDCR 
Inventory, on Section A of his GAMIN In- 
ventory, and on the Mechanical, Scientific, 
Persuasive, Artistic, and Social Service sec- 
tions of the Kuder Preference Record. In 
presenting his material, however, Green made 
it clear that he found it difficult to ensure 
that his two groups were in every respect 
comparable. His samples were also rather 
small.(W’s = 45 and 70). The present pa- 


. per presents the results of a fairly large-scale 


experiment designed from the outset to test 
the hypothesis that with personality-question- 
naire material, response distributions under 
selection conditions will differ significantly 
from those obtained under research condi- 
tions, all other variables having been fully 
controlled. 1 


1Paper presented to the Twelfth International 
Congress of Applied Psychology, London, July, 1955. 


Method 


Subjects. The Ss for this experiment were 400 
male applicants for the job of omnibus conductor, 
between the ages of 18 and 45 and without previous 
experience in this type of work. From test data 
available on a comparable sample, it could be in- 
ferred that intelligence ranged from well above av- 
erage for the general population, to literate subjects 
just above the borderline of statutory mental defect. 
Such applicants would come from the same socio- 
economic background as the great bulk of unskilled 
and semi-skilled factory workers. 

The questionnaire material. The instrument em- 
ployed was a two-part personality measure designed 
by the writer (7) for use as a research criterion 
suited to the varied needs of psychologists in dif- 
ferent fields. The first part consists of 74 items in 
statement form, of which 20 items form the basis 
for a measure of emotional maladjustment. The 
majority of these 20 items, selected after wide use 
in the field, repeated internal and external valida- 
tion, and thorough item analysis, were drawn from 
psychosomatic inventories such as the MMPI and the 
Maudsley Medical Questionnaire (3). Apart from 
eight items of the MMPI Lie Scale, the remaining 
items of this first part are buffers, concerned with 
physical health and intended to convey the general 
appearance of a health inventory. The second part, 
given after an interval during which some other ac- 
tivity is inserted, consists of 36 items, 12 of which 
form the basis for a measure of sociability. The 12 
selected items, like those of the remaining 24, were 
drawn from Guilford’s R (rhathymia) scale. In ad- 
dition to other criteria for selection, it was pro- 
vided that in order to contribute to the total score 
for the second part, an item must not achieve a 
significantly nonchance distribution in respect of 
emotional maladjustment, and also that the items 
selected must not, when considered collectively, 
achieve a significantly nonchance additive chi square 
in respect of their distribution for emotional mal- 
adjustment. The object was of course to ensure 
maximum independence of each other for the two 
Measures concerned. ALD 


Bure Ednl. ` 
Le 


ay Ce 


65 Ree 
oer 
-u Research : i 


rE 


66 


The split-half self-correlations, corrected to full 
length, were .81 for emotional maladjustment (20 
items) and .74 for sociability (12 items), with N = 
378. With a cutting point placed one sigma above 
the mean for 251 normal respondents and one sigma 
below the mean for 27 hospitalized civilian neurotics 
of mixed diagnoses, all responding under research 
conditions, misclassification was 13 per cent for the 
measure of emotional maladjustment. 4 

This two-part inventory was prepared so that it 
could be administered in two ways. One method of 
administration was the conventional pencil-and-pa- 
per form, with “true” and “not true” alternatives; 
the other made use of a box-and-card technique. 
Each item was printed on a card identical with those 
used in the card presentation of the MMPI, but all 
cards were then sealed in the central section of a 
three-part wooden container. The S was shown how 
to withdraw one card at a time from a slot at the 
base of this section, and to post it in either the 
“true” or “not true” box on one side or the other 
of the card-storage section. He did not know how 
many cards he would have to do, nor could he con- 
sult earlier or later cards before responding to the 
one in his hand, and he could not change his mind 
about an earlier response. In both methods of ad- 
ministration, general instructions were identical, and 
in particular it was emphasized that if difficulty was 
encountered in making up one’s mind about a par- 
ticular item, the response should be chosen which 
was “more nearly or most often the case, even if 
only by ever so little.” 

Design of the experiment. The selection situation 
in the recruiting center of the public transport de- 
partment concerned lent itself admirably to a simple 
design. All applicants (for a variety of jobs) were 
seen first for a few minutes by one or other of two 
interviewers in the personnel department. Those 
considered likely to justify further expenditure of 
time by themselves and the department were then 


Alastair Heron 


asked to fill out application forms under group su- 
pervision. Applicants then went before one or other 
of three interviewers and their job desires were dis- 
cussed fully. No shortage of vacancies for the job 
of omnibus conductor prevailed at any time during 
the course of this study. If a man was considered 
suitable for this job, he was told that if he passed 
the medical examination and the vision tests, he 
would be accepted for the training school. He then 
passed to the Medical Department waiting room, 
whence he was soon called forward for examination. 
Those passing the medical examination then moved 
forward to the Allocation Room, where they were 
told they had got the job, and were given a card 
with instructions for reporting to the Training 
School. It will be seen that in the eyes of a job ap- 
plicant there were three hurdles: the initial brief in- 
terview, the application form and its consideration 
by the second interviewer, and the examination by 
the Medical Department. When—and only when— 
he reached the Allocation Room did he know he had 
got the job and had no more hurdles. 

Two hundred men were given the two-part per- 
sonality measure as part of the Medical Department 
procedure by a white-coated orderly or by the writer 
acting as one. Two hundred more men were (by 
alternation) allowed to pass through the Medical 
Department without being given the two-part meas- 
ure. (Fully adequate precautions were taken to en- 
sure that these men did not see or know about the 
activities with the measure on the part of their fel- 
low applicants. This was facilitated, of course, by 
the fact that applicants for a variety of jobs were 
passing through simultaneously.) These two hun- 
dred were not approached until they were leaving 
the Allocation Room, having obtained the job. They 
were then asked if they were willing to spare a few 
more minutes to assist in a piece of medical research 
being carried out independently of the public trans- 
port undertaking, and forming part of a large-scale 


Table 1 
Means and Variances for All Scores, as Described in Text ” 
Emotional Maladjustment Sociability 
Box Paper Box Paper 
Mean o? Mean o® Mean o Mean øo 
Research (main) 5.75 9.36 5.56 12.81 451 5.06 4.50 4.25 
N 100 100 100 100 
Selection (main) 3.61 6.25 3.65 7.56 440 4.84 4.60 3,24 
N 100 100 100 100 
Selection (check) 3.65 7.40 3.80 11.97 4.43 2.43 4.58 3.13 
N 147 145 147 145 


Note.—1. Mean scores for emotional maladjustment differ significantly (P = .001) when compared for conditions of adminis- 


tration (Research [main] vs. either Selection [main] or Selection [check]). 


No other means differ significantly. 


2 fhe only variances to differ significantly (P = .0S) are 7.40 vs. 11.97 (Selection [check], box vs. Emotional 
Maladjustment) and 4.84 vs. 3.24 (Selection [main], box vs. paper, on Sociability). teh) ea A a 


Motivation on Questionnaire Response 67 


6,7 


Sy SELECTION CONDITIONS (CHECK) 
X N= 292, SHOWN AS 200 
SELECTION CONDITIONS (MAIN) 
O N= 200 
RESEARCH CONDITIONS (MAIN) 
E N = 200 


89 10+ 


ENOTIONAL MALADJUSTMENT SCORES 


60 | Ş 
\ 
N 
50 | N 
N 
N| a 
40 | N| A 
3 N| A 
: N| N 
; NL N 
Ss 30 | N N 
I 
NE \ 
20 | N N 
vi 
N 
TE 
AE NE AN 
0.1 23 45 
Fic. 1, 
enquiry into matters concerned with industrial 


health. (This explanation was completely genuine.) 
Only 204 men had to be asked in order to obtain 
the required 200. 

Of each group of 200 men, 100 were given the 
two-part measure by the conventional pencil-and- 
paper method, and the other 100 by the new box- 
and-card method. This design accepted the risk 
that there might prove to be an interaction between 
method of administration and circumstances of re- 
sponse. It seemed essential, however, to use this 
opportunity for comparing the two methods, of ad- 
ministration, and it was, of course, evident that if 
the results did not differ for administration but only 
for circumstances, the main point of the experiment 
would have been more powerfully made. 

It was not possible to replicate the entire experi- 
ment elsewhere, but some valuable data were ob- 
tained from the personnel department of a public 
transport undertaking in another part of the coun- 
try. This information concerned responses obtained 
under routine selection conditions in respect of the 
same type of work as in the main study. Appli- 
cants were given the two-part measure at an earlier 
stage in the procedure, but otherwise comparably. 
Vacancies were available as generally as in the main 


Score distributions for emotional maladjustment. 


study. One hundred forty-five men completed the 
measure by the pencil-and-paper method, and 147 
by box-and-cards. These data can therefore be ap- 
pended as an external check on those shown under 
“selection” conditions in the main study. 


Results and Conclusions 


The means and variances for all scores are 
shown in Table 1, and score distributions for 
emotional maladjustment appear in Fig. 1. 

It can be seen by inspection, without re- 
course to formal tests of significance, that: 

1. The 12-item score for sociability is ap- 
parently unaffected by method of adminis- 
tration or by circumstances of response; 

2. The 20-item score for emotional malad- 
justment is unaffected by method of adminis- 
tration but is seriously affected by circum- 
stances of response, the means and variances 
under selection circumstances being markedly 
and consistently lower than those prevailing 
under research circumstances. 


68 Alastair Heron 


From Fig. 1? it will be clear that there is 
no tendency for high scores only to be af- 
fected, with a resultant median bunching, but 
rather for a general downward shift through- 
out the scoring range. It is of considerable 
interest to note that under research condi- 
tions 24 men out of 200 (12 per cent of the 
sample) obtain scores of 10 or greater; with 
the cutting point described earlier, these 
would be regarded as “probably maladjusted.” 
This figure of 12 per cent accords well with 
other data obtained in earlier work carried 
out by the Unit to which the writer belongs, 
and by others. By contrast, under selection 
conditions only three men out of 200 (114 
per cent of the main sample) appear as “prob- 
ably maladjusted,” a patently unlikely figure 
in such a sample. 

The hypothesis stated above is therefore 
sustained by the experiment, and would ap- 
pear to justify the conclusion that adjustment 
inventories and psychosomatic questionnaires 
are responded to differently when in a real- 
life situation there is some incentive not to 
admit personality defects. This finding ac- 
cords with those of Green (6), as does the 
negative finding concerning the sociability 
score, this being based entirely on items 
drawn from Guilford’s R scale, which alone 
of the STDCR sections showed no significant 
differences in Green’s study. 


Summary 


Four hundred men between the ages of 18 
and 45 applying for the job of omnibus con- 
ductor were given a two-part personality 
measure, one part concerned with emotional 
maladjustment, the other with sociability. 
An experiment was designed to test the hy- 
pothesis that response distributions under se- 
lection conditions would differ significantly 
from those obtained under research condi- 
tions, when all other variables were fully 
controlled. The hypothesis was clearly sus- 
tained for emotional maladjustment and re- 
futed for sociability. Subsidiary and fully 


2To permit direct comparison with the two main 
samples, figures relating to the check sample have 
been reduced proportionately from a basis of N = 
292 to one of N = 200. 


comparable data partially replicated the ex- 
periment and served to confirm it. The re- 
sults are in accord with the only other com- 
parable study known to the writer, and appear 
to provide long-awaited evidence, concerning 
the consequences of real-life motivation on 
questionnaire response, of a kind that could 
not safely be inferred from many previous 
artificial studies of “faking.” 


Received May 11, 1955. 


References 


1. Cross, O. H. A study of faking on the Kuder 
Preference Record. Educ. psychol. Measmt, 
1950, 10, 271-277. 

2. Ellis, A. Recent research with personality in- 
ventories. J. consult. Psychol., 1953, 17, 45- 
49. 

3. Eysenck, H. J. The scientific study of person- 
ality. London: Routledge & Kegan Paul, 
1952. 

4. Faw, V. Situational variations of neurotic scores. 
J. consult. Psychol., 1948, 12, 255-258. 

5. Gough, H. G. Simulated patterns on the Min- 
nesota Multiphasic Personality Inventory. J. 
abnorm. soc. Psychol., 1947, 42, 215-225. 

6. Green, R. F. Does a selection situation induce 
testees to bias their answers on interest and 
temperament tests? Educ. psychol. Measmt, 
1951, 11, 503-515. 

7. Heron, A. A two-part personality measure for 
use as a research criterion. Brit, J. Psychol., 
1956, 47, in press. 

8. Hunt, H. F. The effect of deliberate deception 
on Minnesota Multiphasic Personality Inven- 
tory performance. J. consult. Psychol., 1948, 
12, 396-402. 

9. Kelly, E. L., Miles, C. C, & Terman, L. M. 
Ability to influence one’s score on a pencil 
and paper test of personality, Charact. & 
Pers., 1936, 4, 206-215. 

10. Mais, R. D. Fakability of the classification in- 
ventory scored for self confidence. J. appl. 
Psychol., 1951, 35, 172-174. 

11. Paterson, D. G. Vocational interest inventories 
in selection. Occupations, 1946, 25, 152-153. 

12. Searless, J. R, & Leonard, J. M. Experiments 
in the mental testing of Detroit policemen. 
Detroit: Detroit Bureau of Government Re- 
search, 1936. 

13. Steinmetz, H. L. Measuring ability to fake oc- 
cupational interest. J. appl. Psychol., 1932, 
16, 123-130. 

14. Wesman, A. G. Faking personality test scores in 
a simulated employment situation. J. appl. 
Psychol., 1952, 36, 112-113. 


| 


The Journal oj Applied Psychology 
Vol. 40, No. 2, 1956 


Permanence of Measured Vocational Interests of Adult Males * 


Mabel K. Powers 


University of Minnesota 


All previous published studies on perma- 
nence of vocational interests as measured by 
the Strong Vocational Interest Blank (SVIB) 
have shown remarkable permanence of inter- 
ests (3, 4, 5). However, in all these studies 
the subjects have been high school or college 
students at the time of the first administra- 
tion of the SVIB. It was hypothesized that 
results on adult males representing all levels 
of occupations and a wide range in age might 
be quite different, especially if the time be- 
tween testings was a period of unemployment 
and economic disruption. The present study 
was designed to test this hypothesis. 


z Method 


Subjects. The subjects (Ss) of this study were 
109 men who were in the sample studied by the 
Minnesota Employment Stabilization Research In- 
stitute in 1931 (2) and again in 1941 as part of a 
follow-up study of the same Institute. In 1931 the 
109 men ranged in age from 16 to 63 years with a 
mean age of 33.7 years. Sixty-three of the 109 were 
in the skilled, semiskilled, and unskilled occupational 
levels, 16 were in lower-level clerical occupations, 
and the remaining 30 were in professional, mana- 
gerjal, high-level business, and sales occupations. 
Ninety-four were unemployed in 1931 and all of the 
men in the sample were unemployed at some time 
during the 10 years between testings. 

Procedure. Since the 1927 form of the SVIB was 
used in the 1931 testing, the answers on the original 
blanks were transferred to Hankes answer sheets 
after eliminating the 20 items not appearing on the 
1938 revision of the test. Both tests for each in- 
dividual were scored on 44 occupational scales. A 
rank-difference correlation coefficient was computed 
for each S using the rank order of the standard scale 
scores on the 44 occupational scales at each testing. 
In addition, a product-moment correlation coefficient 
was computed between total standard scores for the 
409 Ss on each of the 44 occupational scales for the 
first and second administration of the SVIB. Also, 
the differences between means of the 109 Ss on each 
scale for the two testings were computed and checked 
for significance of differences. For those scales where 
the difference between mean standard scores was sig- 
nificant, percentage of overlap was computed. 


1 This paper is based upon a portion of the au- 
thor’s doctoral dissertation which was under the di- 
rection of Prof. D. G. Paterson. 


69 


All answer sheets were then analyzed for the pres- 
ence of Primary (P), Secondary (S), Tertiary (T), 
and Reject (R) patterns in each of the 11 interest 
groups, using Darley’s (1) method for the P, S, and 
T patterns and classifying all others as Rejects. A 
tabulation was made of the difference in patterns on 
the first and second testings for each S in each of 
the 11 interest groups, assigning a difference of 0, 1, 
2, or 3 depending on the change in pattern between 
the two testings. To illustrate, if an S had a Pri- 
mary pattern in Group I in the first testing and a 
Secondary pattern on the second testing, he was as- 
signed a difference of one for that group; if he had 
a Primary pattern in Group I in the first testing and 
a Reject pattern in the second testing, he was as- 
signed a difference of three. These differences, re- 
gardless of sign, were summed for each individual 
and the resulting total was considered the over-all 
difference (D) in profile for each subject. This sim- 
ple test could not be checked for significance because 
the distribution is not known for the measure of 
similarity used. Therefore, the mean of the differ- 
ences for the Ss in this sample was compared with 
the greatest possible difference for any two SVIB 
profiles and also with the greatest possible average 
difference for the 109 Ss in this sample based on 
actual differences. 

The Ss were assigned to subsamples on the basis 
of D, using +1 SD, mean, and —1 SD as cutting 
points; the Ss were also assigned to subsamples on 
the basis of the rank-order correlation coefficients 
based on 44 occupational scales, using the same cut- 
ting points. 


Results 


Test-retest correlations for individual pro- 
files. When rho coefficients of correlation 


Table 1 


Distribution of Rho Coefficients of Correlation Between 
Standard Scores 


rho N tho N 
.80-.99** 56 .00-.19 2 
.60-.79** 37 —.20-—.09 1 
.40-.59** 7 —.40-—.19%* 2 
.20-.39*** 4 


* One rho significantly different from 0.0 at .05 level. 
¥** Significantly different from 0.0 at the .01 level. 
# One rho significantly different from 0.0 at the .05 level 
and one at the .01 level. 
Note.—Median rho = .80, mean rho = .78; mean rho +1SD 
= .90, mean rho — 1 SD = .56. 


70 Mabel K. Powers 


Table 2 
Distribution of Correlation Coefficients Between Test-Retest Standard Scores 

Occupational Scale r* Occupational Scale r* 
Artist 74 Personnel director 62 
Psychologist 73 Public administrator 57 
Architect 76 Y.M.C.A, secretary 61 
Physician 76 Social science high school teacher .67 
Osteopath 56 City school superintendent 59 
Dentist 69 Minister 59 
Veterinarian -10 Musician -65 
Mathematician 15 C.P.A. 74 
Physicist 81 Senior C.P.A. 57 
Engineer 82 Accountant 78 
Chemist 81 Office man 74 
Production manager 69 Purchasing agent 67 
Farmer 75 Banker .82 
Aviator 12 Mortician 65 
Carpenter 81 Pharmacist 47 
Printer 65 Sales manager 62 
Mathematics-physical science teacher 61 Real estate salesman 44 
Industrial arts teacher fi Life insurance salesman .06 
Vocational agriculture teacher 62 Advertising man 13 
Policeman 72 Lawyer 17 
Forest service man 62 Author-journalist 74 
Y¥.M.C.A. physical director 62 President manufacturing concern 64 


* All z's significantly different from zero at .001 level. 


were computed between ranks for each of the 
44 occupational scales in each testing for each 
individual, the median rho was + -80, the 
mean rho was + .78 with an SD of 12. The 
range of rhos was from + .96 to — .38 with 
the distribution markedly negatively skewed. 
For 101 of the 109 Ss, rho was significantly 
different (beyond 1%) from zero; for 18, or 
16.5 per cent, rho was + .90 or above. The 
distribution of rhos is shown in Table 1. 

Test-retest correlations for total sample. 
Product-moment correlation coefficients be- 
tween total test-retest scores on individual 
scales ranged from + .82 on the Engineer 
scale to + .44 on the Real Estate Salesman 
scale with all r’s being significantly different 
from zero (at .001 level). By means of trans- 
formation to Z coefficients, the average 7 for 
all 44 scales was found to be + .69. The 
correlations are given in Table 2, 

Difference in mean scores. The Tange of 
differences between mean standard scores on 
separate scales was from — .2 standard scores 
on Purchasing Agent to + 4.5 standard scores 
on Aviator. The differences were significant 


at the .01 level for 14 scales and at the .05 
level for six additional scales. When percent- 
age of overlap was computed for the distribu- 
tions of scores on these 20 scales, using Til- 
ton’s (6) Method, the smallest percentage of 
overlap found for any scale was 87 for the 
Osteopath and Aviator scales. The differ- 
ences between means and percentage of over- 
lap where differences are significant are shown 
in Table 3. 

Permanence of group patterns. For the 
109 Ss in this sample, there were 182 P, 150 
S, 518 T, and 349 R patterns for the 1931 
administration and 218 P, 138 S, 515 T, and 
348 R patterns for the 1941 administration. 
The distribution by Groups is shown in 
Table 4. Since the greatest possible differ- 
ence for P’s and R’s is three and for S’s and 
T’s is two, the greatest possible difference 
(D) for the whole sample was 2929. The 
actual difference was 631, just slightly more 
than 20% of the possible difference. 

The greatest possible difference between 
two profiles on any single individual is 33 if 
all original patterns are P’s and/or R’s; if all 


Permanence of Measured Vocational Interests TE 


Table 3 


Difference in Means for Each Scale on Two Administrations of SVIB (N= 109); 
Per Cent of Overlap where Differences are Significant 


% of % of 
Over- Over- 

Occupational Scale d lap Occupational Scale d lap 
Artist gr Personnel director 6 (ees 
Psychologist 2.6** 90 Public administrator 3.1** 89 
Architect 11 — Y.M.C.A. secretary —2.9* 91 
Physician 2.3** 93 Social science high school teacher —2.9** 92 
Osteopath 3.6°* 87 City school superintendent -14 — 
Dentist 8&8 — Minister -4 = 
Veterinarian 23198 Musician =o i 
Mathematician 8- — C.P.A. —1.4* 95 
Physicist E baiki l Senior C.P.A. 1 — 
Engineer 4.0** 87 Accountant —1.7* 95 
Chemist FY be IF Office man —24* 93 
Production manager 2.6** 90 Purchasing agent -2 — 
Farmer ke hes Banker —1.5* 94 
Aviator Whe traf Mortician 3) bre 
Carpenter 8&8 — Pharmacist K E 
Printer r E S a Sales manager Bo Ey 
Mathematics-physical science teacher — 3 9 — Real estate salesman —1.8* 
Industrial arts teacher 3.19 93 Life insurance salesman —2.6** 89 
Vocational agriculture teacher aSr m Advertising man — 3 
Policeman Fy — Lawyer Fy — 
Forest service man Soe Author-journalist b= 
Y.M.C.A. physical director 17 — President manufacturing concern 22E MOT 


* Significant at .05 level. 
** Significant at .01 level. 


original patterns are S’s or T’s, the greatest 
possible difference is 22. Since the greatest 
possible difference for the 109 Ss in this study 
was 2929, the average for any one S would 
have been 27. The actual differences ranged 
from one to 13 with a mean difference of 5.8, 
median difference of 6.0. The actual distribu- 
tion of differences (D) is shown in Table 5. 

The chi square computed to test the rela- 
tionship between permanence of interests as 
measured by rank-order correlation and by 
difference in group patterns was 50.82 (P < 
001). 


Discussion 


The results of this investigation force a re- 
jection of the hypothesis that adult males, as 
represented by the sample used, may be ex- 
pected to exhibit a lack of permanence in vo- 
cational interests even if the period between 
testings is one of widespread unemployment. 
The remarkably high correlation between two 


profiles for an individual and between total 
test-retest standard scores for the group on 
each of the 44 scales and the high degree of 
overlap of the distributions of standard scores 
on each scale and the stability of group pat- 
terns all lead to the conclusion that the vo- 
cational interests of adult males, regardless of 
age, aptitudes, education, vocational oppor- 
tunity, and economic status tend to remain 
stable for 10 years and presumably over the 
long span of adulthood. Since the majority of 
the Ss were from the low-level clerical, skilled, 
semiskilled, and unskilled categories, these 
data contribute information about a large 
section of the male labor force which is not 
available from previous studies. Therefore, 
the results of this study support the conclu- 
sion that SVIB may be used with confidence 
at the lower occupational levels as well as 
with college and high school students who 
may be destined for economic competition at 
the higher occupational levels. 


72 Mabel K. Powers 


Table 4 Table 5 
Distribution of Primary, Secondary, Tertiary, and Distribution of Sums of Differences (D’s) Between 
Reject Patterns Patterns 
T R Difference 
Group P S ; ay f 
I 1931 3 6 57 43 E 5 
1941 7 10 54 38 
z 12 3 
II 1931 4 7 62 36 11 2 
1941 6 6 68 29 10 6 
TH 1931 43 19 35 12 ; 
1941 58 17 28 6 7 17 
IV 1931 35 16 41 17 6 13 
1941 43 23 30 13 5 13 
4 12 
y 1931 14 11 51 33 3 13 
1941 12 13 52 32 2 9 
VI 1931 14 11 42 42 1 6 z 
1941 14 16 32 47 0 0 
VIL 1931 6 by 53 45 Total 109 
1941 5 4 46 54 
Median = 6,0, mean = 5.8, SD = 3.0. 
VII 1931 25 24 55 5 
1941 25 20 56 8 test and retest standard score for the total 
IX 1931 7 2 51 19 sample for each of the 44 occupational scales 
1941 14 10 56 99 and percentage of overlap for those scales 
showing statistically significant differences, 
X 1931 5 10 26 68 and (d) difference in group patterns front - 
D4 10 3 29 67 test to retest. 
PAD 1931 16 19 45 29 The conclusion reached is that vocational 
1941 24 16 44 25 interests of adult males representing a wide 
Total 1931 182 150 518 349 Tange in age and socioeconomic status are re- 
1941 218 138 515 343 Markably stable when permanence is meas- 
ured by the four methods outlined above. 
Received May 13, 1955. 
Summary 


The Ss of the study were 109 males from 
the Minnesota Employment Stabilization Re- 
search Institute sample who were tested with 
the Strong Vocational Interest Test in 1931 
and again in 1941. The two tests for each 
S were scored on the 44 occupational keys. 
The age range of the Ss was from 16 to 63 
years in 1931 and they were generally from 
the middle and lower socioeconomic groups. 
Permanence of interests was measured in 4 
ways: (a) rank-order correlation between test- 
retest standard scores on the 44 occupational 
scales for each subject, (b) product-moment 
correlation coefficients between total test-re- 
test standard scores on each of the 44 occu- 
pational scales, (c) difference between mean 


References 


1. Darley, J. G. Clinical aspects and interpretation 
of the Strong Vocational Interest Blank. New 
York: Psychological Corp., 1941, 

2. Paterson, D. G. Research studies in individual 
diagnosis. Bull. Employment Stabilization 
Res. Inst, Univer. of Minnesota, 1934, 3, No. 


4. 

3. Stordahl, K. E. Permanence of Strong Voca- 
tional Interest Blank scores. J. appl. Psy- 
chol., 1954, 38, 423-427. 

4. Strong, E. K., Jr. Permanence of interest scores 
over 22 years. J. appl. Psychol, 1951, 35, 
89-91. 

5. Strong, E. K., Jr. Nineteen-year followup of en- 
gineer interests. J. appl. Psychol, 1952, 36, 
65-74. 

6. Tilton, J. W. The measurement of overlapping. 
J. educ. Psychol., 1937, 28, 656-662. 


The Journal of Applied Psychology 
Vol. 40, No, 2, 1956 


Influence of Scale Administrator on Employee Attitude 
Responses 


Marvin D. Dunnette 


Minnesota Mining & Manufacturing Co., St. Paul, Minnesota 


and Herbert G. Heneman, Jr. 
Industrial Relations Center, University of Minnesota 


Systematic investigation of employee atti- 
tudes is a relatively recent development in 
American business and industry. In recent 
years, employee attitude information has 
proved useful as a means of upward com- 
munications from employees to their employ- 
ers. Thus employee attitude measurements 
have been widely used by employers in as- 
sessing the relative strengths and/or weak- 
nesses of their personnel and employee rela- 
tions programs. As an aid in this direction, 
most employee opinion surveys have adopted 
content that is very broad in its coverage. 
For example, standardized employee attitude 
questionnaires often include specific subscales 
designed to measure attitudes toward such 
‘actors as hours worked, pay received, type 
of work, working conditions, supervision, co- 
workers, employee benefits, job security, op- 
portunities for increased status and promotion, 
adequacy of employment communications, etc. 
Most research questions concerning the writ- 
ing of items, the statistical analyses appor- 
priate to item retention, and the final stand- 
ardization of attitude questionnaires have been 
answered adequately. 

Questions still arise, however, with regard 
to the optimal methods of scale administra- 
tion. It is desirable, obviously, that attitude 
scales be administered in such a way as to 
secure as accurate a picture as possible of 
employees’ “true” attitudes—their actual un- 
biased responses to the statements in the par- 
ticular scale being administered. d 

One of the more important procedural or 
administration questions involves the general 
problem of respondent anonymity. For most 


1The authors wish to acknowledge the assistance 
of Earl F. Cheit and Thelma Kunde who aided in 
the planning of the study and the administration of 
the questionnaires. 


73 


industrial uses, respondents to attitude scales 
or questionnaires need not be identified. It 
is the purpose of management usually to 
measure the group feeling; knowledge of the 
attitudes of specific individuals is not neces- 
sary in determining the over-all strengths and 
weaknesses leading to the need for adminis- 
trative action. It is common practice, there- 
fore, to emphasize during the scale administra- 
tion that employees need not—indeed, that 
they should not—sign their names to their 
individual questionnaires. Often the admin- 
istration procedure provides for a sealed bal- 
lot box into which employees are asked to 
drop their unsigned questionnaires. It would 
appear that these procedures would be suffi- 
cient to protect the anonymity of individual 
respondents. As a matter of fact, from the 
administrator’s and from the firm’s stand- 
point, it is sufficient. 

However, a more important consideration 
has to do with the impression conveyed to 
employees who are completing the question- 
naire. Hyman (2) has called attention to 
the necessity for distinguishing between literal 
anonymity and psychological anonymity. For 
example, questionnaire studies normally ask 
for such items of information as department 
number, age, length of work experience, etc. 
Requirements for this kind of information 
could easily suggest to an employee that spe- 
cial efforts are being taken to secure identify- 
ing information. 

Furthermore, the employee’s perception of 
the testing situation could have an important 
effect on his view of the use to which the 
attitude results might be put. Situational 
factors which could affect the general cli- 
mate of psychological anonymity might in- 
clude, among other things, the extent to which 
rapport is established between employees and 


74 


Marvin D. Dunnette and Herbert G. Heneman, Jr. 


Table 1 
Comparison of Sex, Age, Experience, and Education Characteristics of Samples A and B 


7 


Experience Age Education 
(years) (years) (years) 
Sex 
Samples (% Female) Mean SD t Mean SD t Mean SD t 
A 82% 4.3 4.8 40.1 17.4 10.6 2.1 
1.09 0.56 1.00 
B 82% 59 .84 42.0 13.4 10.3 1.7 


administrator. In this vein, the actual identity 
of the test administrator could have an effect 
on responses obtained. It is this factor—the 
identity of the test administrator—that has 
prompted the present study. 

The Industrial Relations Center (IRC) 
policy in the administration of its Employee 
Attitude Questionnaire (1, 3) has been to as- 
sign responsibility for scale administration to 
an IRC staff member or to some representa- 
tive (not a member of the particular firm) 
appointed by the IRC executive staff. The 
reasoning behind this policy is based on the 
belief that every effort should be taken to ac- 
company literal anonymity with psychologi- 
cal anonymity. 

However, this policy is not universal. Many 
widely used standardized attitude scales do 
not provide, necessarily, for an “external” 
survey administrator. To be sure, question- 
naires usually are unsigned and a ballot box 
is used, but it is rare that any effort is made 
to control possible effects resulting from “in- 
ternal” administration of the scale. 

Recently, the Industrial Relations Center 
was afforded opportunity to conduct a study 
directed toward discovering the extent of 
attitude-item response distortion attributable 
to the identity of the administrator, 

It was decided to compare attitude scale 
responses obtained by two different survey ad- 
ministrators, the first an IRC staff member; 
the second, an official of the firm in which 
the survey was being conducted. The major 
hypothesis was that the presence of an offi- 
cial from the firm acting as the survey ad- 
ministrator would constitute more of a threat 
to employees’ feelings of psychological ano- 
nymity than would the presence of an “ex- 
ternal” administrator such as an IRC staff 
member. 

From this major hypothesis, four subhy- 
potheses were formulated for testing. 


1. A threat to employees’ feelings of psy- 
chological anonymity would result in their re- 
sponding more favorably to the survey than 
employees not so threatened. 

2. A threat to psychological anonymity 
would result in differential amounts of re- 
sponse distortion depending upon the con- 
tent of different items comprising the ques- 
tionnaire. 

3. Employees with initially unfavorable at- 
titudes would contribute most to any response 
distortion occurring under the threatening 
condition. 

4. Employees feeling a threat to ano- 
nymity would tend to give fewer and shorter 
responses to open-end questions than em- 
ployees not so threatened. 


Method 


The measuring instrument used in the study was 
the IRC Employee Attitude Questionnaire (1, 3). 
The scale employs Likert-type response categories. 
Scoring is done by either one of two methods: (a) 
the traditional Likert technique of assigning 0, 1, 2, 
3, or 4 points to each response depending on its de- 
gree of favorableness, or (b) by reporting the per- 
centages of responses given in the favorable, unfavor- 
able, or undecided categories, Attitudes measured by 
the 53 items of the questionnaire are divided among 
seven subscales which have been labeled: General 
Morale, Working Conditions, Type of Work, Hours 
and Pay, Supervision, Co-Workers, and Communi- 
cations. The number of items in these scales varies 
from five in the Communications and Supervision 
areas to 16 in the General Morale Scale. 

In addition to these seven standardized scales, 
three open-end questions are included which read 
as follows: 

1. What do you like best about working for this 
Company ? 

2. If you were made head of this Company, what 
would you do to make this a more pleasant place 
to work? 

3. If you have any more comments to make about 
your job, your supervisor, or the Company, write 
them below. 

The study was undertaken in a large department 
store in the Twin Cities area. A list of all rank- 


Scale Administrator and Employee Attitude Responses 75 


and-file employees was obtained from the personnel 
department. Samples were drawn randomly from 
this list to provide subsamples of 45 individuals each. 

An Industrial Relations Center staff member ad- 
ministered the Attitude Questionnaire to persons in 
Sample A. The personnel manager (known to all 
employees) of the department store administered the 
questionnaires to persons in Sample B. The meth- 
ods of administration were identical in all other re- 
spects, Employees were assured that their replies 
were confidential; they were specifically asked not to 
sign their names, and they placed their completed 
questionnaires in a ballot box which was labeled 
PROPERTY OF THE UNIVERSITY OF MIN- 
NESOTA, 

As a check on the comparability of the samples, 
information concerning sex, age, length of experi- 
ence, and education was obtained. Comparisons 
based on these four factors yielded the data shown 
in Table 1, 

Information in Table 1 shows that none of the 
differences between samples is sufficiently large to 
reject the hypothesis that the samples are drawn 
from the same population. 


Results 


Table 2 shows mean scores, standard devia- 
tions, and ¢ values for each of the samples on 
each of the subscales of the Employee Atti- 
tude Questionaire. 

Mean scores for Sample B (the internally 
administered group) are higher (i.e., more 
favorable) on every subscale than the mean 
scores of Sample A. The differences are sta- 
tistically significant on two of the seven sub- 
scales: General Morale and Supervision. In 
terms of overlap, from 30% to 35% of em- 
ployees in Sample A exceed the median of 
those in Sample B. It appears that, for rank- 
and-file employees at least, a moderate degree 
of distortion has occurred in the expected di- 
rection. The first hypothesis is, therefore, 
supported. 

In order to test the second hypothesis, the 


one concerning differential effects of item 


content, seven IRC staff members were asked 
to identify items the content of which might 
appear to be particularly threatening to an 
employee who was not completely convinced 
of the anonymity of his responses. Thirteen 
items, on which five or more of the staff 
members agreed, were selected as comprising 
such a “Threat Scale.” These items include 
statements directed primarily toward inter- 
personal and supervisory relationships with 
heavy emphasis on the degree of understand- 
ing extended toward employees and the fair- 
ness with which employees are treated. 


Table 2 


Summary Statistics for Attitude Scores Obtained by 
Samples A and B of Rank-and-File Employees 


Sample A Sample B 
IRC- Internally 
Admin- Admin- 
istered istered 
(N=45) (N=45) 
General morale Mean 37.9 42.5 
SD 9.5 9.4 
t 2.28* 
Type of work Mean 17.1 17.6 
SD 3.0 3.4 
t 0.76 
Co-workers Mean 13.5 14.0 
SD 3.5) 3.2 
t 0.81 
Working Mean 18.9 19.7 
conditions SD 5.0 4.8 
t 0.73 
Hours and pay Mean 18.6 19.4 
SD 35 4.2 
t 0.90 
Supervision Mean 13.6 15.2 
SD 3.9 3.4 
t 2.01* 
Communications Mean 11.9 12.3 
SD 2.9 3.3 
t 0.71 
Total score Mean 131.3 140.7 
SD 22.8 23.9 
t 1,90 


* Significant at the 5% level. 


The IRC staff members also selected items 
which they believed definitely did not in- 
clude “threatening” content. There was good 
agreement on 32 such items. 

Table 3 shows comparisons between mean 
scores obtained by the two groups on the 
Threat and Nonthreat Scales. It is seen that 
the major point at which distortion of re- 
sponses occurs is on the Threat Scale items. 
In terms of overlap, only 33% of Sample A 
rank-and-file employees exceed the median of 
employees in Sample B. The range of scores 
for the former group is 3 to 50 and that for 
the latter group, 22 to 52. Analysis of re- 
sponses to single items shows that the group 
to whom the personnel manager adminis- 
tered the survey gave a larger percentage of 


76 


Table 3 


Summary Statistics for Scores Obtained by Employee 
Groups on Threat and Nonthreat Scales 


Threat Scale Nonthreat Scale 
SampleA Sample B SampleA Sample B 
IRC Internal IRC Internal 
Mean 32.3 37.3 79.5 82.5 
SD 9.4 7.5 11.4 13.9 
t 2.78** 1.13 


** Significant at the 1% level. 


favorable responses to every item in the 
threat scale. The percentage differences range 
from a low of 2% to a high of 25%, with a 
median of 13%. Differences in percentage of 
favorable responses on the Nonthreat items 
were about equally divided between positive 
and negative values with a median of only 
1%. 

These data support the second hypothesis. 
The effect of the identity of the administrator 
upon survey responses is most marked among 
certain definite items whose content involves 
attitudes toward various aspects of super- 
visory behavior and the spirit of fair play in 
the work setting. 

The third hypothesis has to do with the 
extent to which different individuals with dif- 
ferent initial attitudes might be prone to give 
responses that are more favorable than war- 
ranted by their actual or “true” attitudes. 
Two methods were used in an effort to test 
this hypothesis. 

First, employees in the two groups were 
ranked according to their score on the Threat 
Scale. Employees with corresponding ranks 
were paired. The ranks were normalized and 
the resulting scale scores were correlated with 
differences between threat scores for the 45 
employee pairs. The cortelation between these 
variables is — .57. If the two extreme values 
at the lower end of the continuum are de- 
leted, however, the correlation is reduced to 
— 41. These negative correlations suggest 
that differences between the two groups tend 
to be less at the more favorable or higher end 
of the continuum. However, examination of 
the scatter diagram suggested that correla- 
tions were raised spuriously by a small clus- 


Marvin D. Dunnette and Herbert G. Heneman, Jr. 


tering of 8 or 10 measures in the middle por- 
tion of the continuum. A second method of 
analysis was therefore undertaken. 

Rank-and-file employees scoring in the top 
third and bottom third of the total Attitude 
Questionnaire were designated, respectively, 
as having generally favorable and generally 
unfavorable attitudes toward their total work 
situation. Comparisons were then made be- 
tween Threat scores obtained by high-scoring 
persons in Samples A and B and between 
low-scoring persons in Samples A and B. 
The results are shown in Table 4. Both 
comparisons result in differences significant 
at the 5% level. The mean difference be- 
tween the favorable groups is 3.6 points as 
compared to a mean difference of 5.6 points 
for the unfavorable groups. At first glance, 
the “larger” difference between the unfavor- 
able groups appears to lend support to the 
hypothesis being tested. However, these fig- 
ures should be interpreted in terms of the 
variabilities of scores at the two points on the 
attitude continuum. In other words, we need 
to examine the extent of overlap between the 
sets of distributions. In terms of overlap, 
about 13% of the employees in cach of the 
two IRC-administered samples exceed the 
medians of the firm-administered samples. 
There is no difference, therefore, between the 
amounts of response distortion shown by 
initially favorable and initially unfavorable 
employees. It appears that evidence is not 
sufficient to support a claim that employees 
with unfavorable “true” attitudes will distort 
or shift their responses to a greater extent 
than employees with favorable “true” atti- 
tudes. 


Table 4 


Summary Statistics for Threat Scores Obtained by 
Employees in Samples A and B Who Have 
Favorable and Unfavorable Attitudes 


Favorable Over-all Unfavorable Over-all 
Attitudes Attitudes 

SampleA Sample B SampleA Sample B 

IRC Internal IRC Internal 

(V=16) @=15) (N=15) (N=15) 
Mean 40.9 44.5 OXY 29.3 
SD 3.7 3.7 8.4 4.7 

2.59 2.18* 


* Significant at the 5% level. 


Scale Administrator and Employee Attitude Responses 17 


The final hypothesis involved the incidence 
of “open-end” responses among the two sam- 
ples. This effect was determined simply by 
counting the number of employees choosing 
to respond to each of the three open-end ques- 
tions. Results are shown in Table 5. 

In every case, a larger proportion of per- 
sons in Sample A (the IRC-administered 
group) responded to the open-end questions 
than in Sample B. All of the differences are 
statistically significant. Furthermore, the av- 
erage length of each comment given by the 
former group was 37 words as compared with 
an average of only 24 words for each com- 
ment by the latter group. The effects of these 
two factors combined with the result that 
the IRC-administered group produced exactly 
twice as much verbal material as the group 
to whom the firm’s personnel manager ad- 
ministered the questionnaire. This is strik- 
ing support in favor of the fourth and final 
hypothesis. It appears that the identity of 
the administrator can have an important ef- 
fect on the desire of employees to respond 
freely to open-end questions. 

It has been the IRC’s experience that man- 
agement officials in the firms regard the ver- 
batim comments from their employees as es- 
pecially valuable supplementary evidence to 
the scores obtained on the standardized scales. 
In view of this, the finding with regard to 
willingness to answer open-end questions may 
well be the most relevant of the entire study. 

This study has led the IRC staff to con- 
clude, for the time being at least, that the 
policy of using staff members or staff repre- 
sentatives to administer the Triple Audit 
Questionnaire is well founded and that it 
should be continued. It appears that the ne- 
cessity for distinguishing between literal ano- 

_nymity and psychological anonymity has been 
confirmed, and a few steps have been taken in 
defining the attitude areas which may be most 
susceptible to the subtle effects of different 
survey administrators. The study suggests, 
particularly, that any survey directed toward 
securing employee attitude information bear- 
ing on interpersonal and supervisory relation- 
ships should take special pains to engender 
feelings of complete psychological anonymity 
on the part of the employees participating in 
the survey. 


Table 5 


Proportions of Persons Responding to 
Open-End Questions 


What do you What would you 


like best do if you were Other 
about made head of Comments 
working here? this company? (% re- 
Samples (% responding) (% responding) sponding) 
A 16% 69%, 44%, 
B 49% 49% 16% 
1=2.76** t=1.98* t=3.04** 


* Significant at the 5% level. 
#* Significant at the 1% level. 


Summary and Conclusions 


A study was conducted to discover the 
effects on attitude scale responses of the 
identity of the survey administrator. Two 
employee samples were selected randomly 
from the total work force of a large Twin 
Cities department store. An Industrial Re- 
lations Center staff member administered the 
IRC Employee Attitude Questionnaire to one 
of the groups; the personnel manager of the 
store administered the questionnaire to the 
other group. 

The results gave support to the following 
hypotheses: 

1. A threat to employees’ feelings of ano- 
nymity results in their responding more favor- 
ably to an attitude survey than employees 
not so threatened. 

2. A threat to anonymity results in differ- 
ential amounts of response distortion depend- 
ing upon the content of different items com- 
prising the questionnaire. 

3. Employees feeling a threat to ano- 
nymity tend to give fewer and shorter re- 
sponses to open-end questions than employees 
not so threatened. 


Received June 10, 1955. 


References 


1. Heneman, H. G., Jr, & Yoder, D. Employee 
opinion survey by remote control. Personnel 
J., 1953, 32, 169-172. 

2. Hyman, H. H. Interviewing and social research. 
Chicago: Univer. of Chicago Press, 1954. 

3. Yoder, D., Heneman, H. G., Jr, & Fox, H. 
Auditing your manpower management. Min- 
neapolis: Univer. of Minnesota Press, May, 
1954. 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


. 


Attitudes of Union and Non-Union Employees in a Calcutta 
Electrical Engineering Factory 


Harish Chandra Ganguli 


Indian Institute of 


Every worker in a factory is subject to 
cross pressures for loyalty to the two mem- 
bership groups—the company and the labor 
union. It is also to be expected that if a 
worker is a member of the union, he perceives, 
judges and feels about things in the work 
situation in which the union is interested in 
a different way than the worker who has not 
joined the union. Largely because of this in- 
fluence of the group on the norms and atti- 
tudes of its members, the morale of the 
worker, meaning thereby his contentment with 
the employment relationship, is also likely to 
be affected by this union membership. 


The Study 


A morale survey was made by the author 
in an important engineering factory in Cal- 
cutta which produced all types of electric 
fans. In 1951 it had 1,890 workers on its 
roll and produced 62,188 electric fans which 
is equivalent to 29.3 per cent of the coun- 
try’s total fan production. The factory had 
a labor union controlled by extreme leftist 
trade unionists since 1945. This union (to 
be referred to as the outside union or simply 
the union) is not recognized by the company 
although it has a substantial hold over the 
workers, having fought several successful and 
partly successful battles for them in the past. 
In 1952 about 55 per cent of the workers 
were members. The company has tried to 
help the development of a rival union in the 
factory which is commonly known amongst 
the workers as the “inside” union. Its mem- 
bership is limited, however, to about 8 per 
cent of the workers, the rest being unattached 
to either of these two organizations, 

The morale survey involved all workers 
from the foundry shop, the machine shop, 
and one assembly shop, totaling 550 work- 
ers. The method of study closely resembles 
the sample interview survey technique (6), 
though on a much smaller scale. An attitude 


Technology, Khargpur 


scale on the principle of summated ratings 
(5) was constructed on the basis of initial 
exploratory interviews with 40 workers and 
four pretests. The data regarding satisfac- 
tion of the workers were obtained from sys- 
tematic interviews with them on the basis of 
this scale. The attitudinal items had fixed al- 
ternative answers and the relevant response 
alternatives were checked by the interviewer 
during a free-answer interview with each 
worker. Each interview took between one 
and one-and-a-half hours to complete. In 
the construction of the scale and conduct of 
the interviews the usual precautions were 
taken for ensuring the validity of the results 
(29); 
Results 


The preliminary scale contained 41 items. 
To know the primary morale dimensions that 
were covered by these items, a factor analy- 
sis by Thurstone’s centroid method was made 
with 23 of the best items in the scale. These 
items were selected on the basis of discrimina- 
tion values and item analysis. The factor 
analysis led to the isolation of three factors 
or dimensions of morale. The first factor, 
Factor C, refers to the worker’s satisfaction 
with the benefits he derives from the employ- 
ment relationship and also his over-all confi- 
dence and satisfaction with the total organi- 
zation. It refers to issues like how the fac- 
tory compares with other factories as a place 
to work, if the company is generally sympa- 
thetic to the workers and appreciative of their 
point of view, and also to specific issues like 
income, the welfare activities of the com- 
pany, the effectiveness of supervision, etc. 
Factor So gives a measure of the workers’ 
satisfaction with the technical and organiza- 
tional aspects of supervision, and has refer- 
ence to issues like division of work load, at- 
tention given to their suggestions regarding 
methods of work, tools, etc., providing ade- 


78 


Attitudes of Union and Non-Union Employees 79 


Table 1 Table 2 
Distribution of Scores of 548 Workers on the Percentage of Workers Obtaining Different Scores on 
Three Morale Factors Factor C with Different Union Groups 
Treated Separately 
Factor Factor Factor 
So p No. of Workers in Percent: 
Roe Naot (Naot o. of Workers in Percentage 
Score Workers) Score Workers) Workers) Score Inside Non-Union Outside 
8-12 104 5-7 24 39 8-12 6.5 10.8 26.5 
13-17 266 8-10 103 113 13-17 26.1 44.6 54.7 
18-22 132 11-13 158 193 18-22 41.3 30.4 17.1 
23-27 35 14-16 154 181 23-27 15.2 11.3 7 
28-32 11 17-19 109 22 28-32 10.9 2.9 0.0 


quate facilities for the men to do their work 
properly, and so on. Factor Sp refers to the 
satisfaction the worker derives from his re- 

| lations with the supervisor as a person. The 
supervisor’s skill in handling his men, the rea- 
sonableness of what he expects from them, 
and, in general, whether he can be regarded 
as their “own man” are important points in 
this connection. 

For each worker, therefore, measures for 
these three types of satisfaction have been 
obtained. Table 1 gives the distribution of 
scores on these factors for the 548 workers.* 

For the three union groups separately, 
Tables 2 and 3 give the percentage of work- 
ers in each score interval in Factors C, So, 
and Sp. 

On analysis it was found that in each mo- 
rale dimension the outside union group was 
most dissatisfied, the inside union group was 
most satisfied, and the non-union group oc- 
cupied a middle position between these. This 
is seen from Table 4. Also ¢ tests show that 
the difference in satisfaction between any two 
union groups on each morale factor was sig- 
nificant to the 1 per cent level except for the 
difference between inside union and non-union 
groups in Factor So which is significant at the 
5 per cent level only. 

To find out if this relation between mem- 
bership of the unions and the satisfaction of 


1 Data for two workers were incomplete and there- 
fore ignored. Because of lack of background data 
for six others, most statistics from Table 4 onwards 
have been calculated on the following frequencies: 
Inside Union—46, Non-Union—202; Outside Union— 

l 294. These samples are roughly proportionate to the 
membership of the three union groups in the whole 
factory. 


Research 
LING COL: EGE | 


ANR 


the workers was genuine, further analysis was 
made of this relationship by controlling on 
other variables that have influenced worker 
satisfaction. The variables controlled were 
supervisory group, nature of work, income, 
pay increase in last three years, financial 
aspiration, length of service, schooling of the 
workers, etc. Thus, for example, relation be- 
tween union membership and confidence in 
the company (Factor C) has been deter- 
mined by statistically comparing outside and 
non-union workers (inside union workers 
were too few) under the same supervisor, do- 
ing the same type of work, and in the same 
income group. Such analyses showed that in 
almost every case the inside union workers 
were most satisfied and outside workers least 
satisfied, and the non-union workers neither 
very satisfied nor very dissatisfied. With re- 
gard to attitude toward supervision, it was 
also noted that in comparatively ill-managed 
shops the discrepancy between attitudes of 
union and non-union workers was greater, 
The effectiveness of supervision thus seemed 
to act as an intervening variable. 

A further comparison of the different un- 
ion groups has been made regarding the ex- 
tent of their dissatisfaction in general. Ow- 
ing to shortage of space, however, differences 
over specific issues are not discussed. Such 
an over-all comparison of these union groups 
from the same factory is to some extent justi- 
fied since they were subject to the same type 
of company policies and practices and the 
same personnel policies. These groups were 
also comparable to some extent regarding the 
work they did and the general quality of su- 


80 Harish Chandra Ganguli 
Table 3 
Percentage of Workers Obtaining Different Scores on Factors So and Sp with the 
Different Union Groups Treated Separately 
Factor So Factor Sp 
No. of Workers—% No. of Workers—7% 
Score Inside Non-Union Outside Inside Non-Union Outside 
5-7 = 1.0 7.4 = 2.0 11.7 
8-10 8.7 64 28.9 6.5 13.7 27.5 
11-13 8.7 26.5 33.5 21.7 33.8 38.3 
14-16 34.8 37.2 20.8 58.7 46.1 20.1 
17-19 47.8 28.9 9.4 13.0 44 2.4 
pervision, since in each of the three shops— Discussion 


foundry, machineshop, and the assembly linės 
—there were some inside union workers, some 
outside union workers, and some non-union 
workers, 

Assuming that attitude was normally dis- 
tributed, the range of morale scores along 
each factor was divided into five segments of 
1.2 ø each. These five subgroups may be re- 
garded as categories of satisfaction, ranging 
from the “most dissatisfied” category to the 
“most satisfied” category. Table 5 below 
gives the percentage of workers in each satis- 
faction group, union-wise. 

Table 5 shows that always there were more 
outside union workers than non-union work- 
ers in the dissatisfied groups. Inside union 
workers, on the other hand, were more satis- 
fied than the other two groups. Also the per- 
centage of satisfied and dissatisfied workers 
in any union group over all the three morale 
dimensions were similar. About 40 per cent 
of outside union workers were dissatisfied 
over each morale dimension, and about 40 
per cent of non-union workers and 60 per cent 
of inside union workers were satisfied. 


Table 4 


Mean and Standard Deviations of Scores on the Three 
Factors for the Three Union Groups 
Treated Separately 


Group FactorSo FactorSp Factor]C 
Inside union 15.52.62 1444213 19,945.27 
Non-union 14.4+2.58 13.24248 17.644.68 
Outsideunion 11.82.99 11.342.35 14.73.54 


This study was originally intended to de- 
termine variations in industrial morale and 
factors responsible for these variations. Very 
soon, however, it was realized that the union 
membership of some workers and nonattach- 
ment to it of others was the most important 
single factor related to their job contentment. 
Significantly negative biserial correlations be- 
tween the three morale factors and “outside 
union membership—no union membership” 
were found. These are 


With confidence in company, ete. —0.410 
(Factor C) +0.049 
With technical aspects of super- —0.500 
vision (Factor So) +0.045 
With human relations aspects of —0.396 
supervision (Factor Sp) +0,049 


This inverse relation was found to exist even 
when analysis was made after keeping con- 
stant other variables like nature of super- 
vision, nature of work, etc. An over-all view 
of this influence is shown by the fact that 
about 60 per cent of inside union workers 
and 40 per cent of non-union workers in con- 
trast to only 12 per cent of outside union em- 
ployees expressed satisfaction in the three 
areas of their employment relationship. An 
analysis of attitudes on individual issues 
showed that maximal difference in satisfac- 
tion between the outside union group and 
the other two groups existed in connection © 
with (a) relation with one’s supervisors and 
allocation of the work load, and (b) the treat- 


Attitudes of Union and Non-Union Employees 81 
Table 5 
Percentage of Outside, Inside, and Non-Union Employees in Different Satisfaction Subgroups 
Most Dis- Dis- Most 
Area Group satisfied satisfied Neutral Satisfied Satisfied 
Technical aspects of Outside union 6.0 35.6 45.7 12.1 0.6 
supervision Non-union 0.5 10.3 48.0 35.3 5.9 
(Factor So) Inside union 0.0 8.7 30.4 50.0 10.9 
Human relations aspect Outside union 5.0 33.9 48.3 10.8 2.0 
of supervision Non-union 2.0 13.2 43.1 37.7 4.0 
(Factor Sp) Inside union 0.0 6.5 34.8 47.8 10.9 
Satisfaction from wages, Outside union 47 32.6 49.0 13.4 0.3 
etc, and with the total Non-union 2.0 15.2 43.1 34.8 4.9 
organization Inside union 2.2 6.5 30.4 43.5 17.4 


miz ‘actor C) 


ment received from the company and how 
this factory compared with other similar fac- 
torjes as a place to work. 

It is not known and cannot be exactly 
known from the present study how far the 
union (outside) is responsible for this greater 
dissatisfaction among its members. That it 
made a significant contribution to such a 
condition may be inferred from the enormous 
efforts it made by means of mass rallies, 
pamphlets, wall placards, through informal 
group discussions and on a national level, 
through its newspaper to spread dissatisfac- 
tion among its workers with their wages and 
conditions of service and with the generally 
capitalistic and antiworker attitude and poli- 
cies' and practices of the management. But 
it was also true that many workers who join 
the union, especially in its early formative 
stage, are somewhat more dissatisfied and 
anticompany in their attitude than the aver- 
age. Therefore this relation between poor 
morale of the worker and his union member- 
ship seemed to be more correctly expressed 
as a circular causal relation. Each was to 
some extent the cause of the other, and in 
turn is also affected by the other. 

That the results of the present study are 
not altogether peculiar to the specific plant 
is supported by a similar study conducted by 
the author in another important light engi- 
neering factory in Calcutta (2, 3). This fac- 
tory had only one union, which was also Com- 


munist-dominated and not recognized by the 
company. The results showed that whereas 
38 per cent of non-union employees were gen- 
erally satisfied with their employment rela- 
tionship, only 23 per cent of the union mem- 
bers were satisfied. Analysis showed this 
negative relation to be present even when 
factors like supervision, earnings, and educa- 
tion of workers were kept constant. Attitudes 
between the two groups over specific issues 
varied in the same way as for workers in the 
present study. 

Union membership and poor morale need 
not be associated necessarily. In a study 
made in the United States and reported by 
Kolstad (4), for example, it was found that 
there were very few significant differences 
when the attitudes of union and non-union 
employees were compared. One difference 
was regarding pay, an item covered by un- 
ion contract. Fifty-two per cent of non-un- 
ion workers as compared with only 30 per 
cent of union workers were dissatisfied with 
it, and also 71 per cent as compared to 42 per 
cent of non-union employees thought that 
they got as high or higher pay than they 
would get for similar work in other factories. 
Non-union employees as a group, however, 
felt more definitely that they “belonged,” 
that they were part of the organization. 

The important factors in this context seem 
to be the attitude of management toward la- 
bor organizations, the policies and practices 
of these organizations and the resulting de- 


82 


gree of cooperation between the two. In 
India many managements do not feel that 
labor has any right or grounds to organize 
as evidenced by the many unrecognized un- 
ions. Many unions, especially those affili- 
ated with some extreme leftist organizations, 
do not always follow a policy dictated by the 
workers’ interests alone and thus cooperation 
between labor and management in the form 
of collective bargaining, etc. is conspicuous 
only by its absence. The degree to which 
these characteristics are present in any fac- 
tory determines the association between un- 
ion membership and poor worker morale, and 
they were largely present in the two factories 
studied; hence the results as noted above. 
Incidentally, the very large difference of 
opinion between union and non-union em- 
ployees over aspects of supervision empha- 
sizes its importance for the worker as pointed 
out by Roethlisberger (8), and also points to 
the possibility that many supervisors adopt 
management goals as opposed to union goals 
for their frame of reference. Again, the sub- 
stantial difference in the way these two groups 
evaluate general issues like the company’s 
treatment of and sympathy for the worker, 
its wage payment policy, and absence of any 
such difference over specific issues like satis- 
faction with nature of work done, wages re- 
ceived, and so on seem to be another illustra- 


Harish Chandra Ganguli 


tion of the principle that social influences are 
more powerful in ambiguous and vague situa- 
tions than in a clearly structured situation. 


Received June 14, 1955. 


References 


1. Anonymous. Human relations study techniques. 
Ann Arbor: Survey Research Center, Institute 
for Social Research, 1949. 

2. Ganguli, H. C. A study on effect of union mem- 
bership on industrial morale. Indian J. Psy- 
chol., 1954, 29, 45-60. 

3. Ganguli, H. C. Relation of union membership to 
attitude of industrial workers. Indian J. soc. 
Wk, 1954, 15, 189-199. 

4. Kolstad, A. Excerpts from employee attitude sur- 
vey. In M. L. Blum (Ed.), Readings in gy- 
perimental industrial psychology. New vol: 
Prentice-Hall, 1952, pp. 114-117. 

5. Likert, R. A technique for the measurement of 
attitudes. Arch. Psychol., N. Y., 1932, No. 
140. 

6. Likert, R. The sample interview survey. In W. 
Dennis (Ed.), Current trends in psychology. 
Pittsburgh: Univer. of Pittsburgh Press, 1947. 
Pp. 196-225. 

7. Marriott, R. Some problems in attitude survey 
methodology. Occup. Psychol., Lond., 1953, 
27, 117-127. 

8. Roethlisberger, F. J. The foreman: master and 
victim of double talk, In S. D. Hoslett (Ed.), 
Human factors in management. London: 
Macdonald & Evans, 1948. Pp. 51-72. 

9. Vernon, P. E. The assessment of psychological 
qualities by verbal methods. Industr. Hlth 
Res. Bd. (Lond.), 1952, Rep. No. 83. 


The Journal of Applied Psycholo; 
Vol. 40, No. 2, 1956 z S 


Prediction of Success from Examination of Performance 
During the Training Period 


Patricia C. Smith and Robert A. Gold 


Cornell University 


At what point in the learning period can 
ultimate success on the job be predicted? 
This question arises whenever probationary 
periods must be set, or whenever a foreman 
must decide whether a particular learner 
should continue training, be transferred, or 
be dropped. Frequently the problem is 
whether actual performance for a given week 
ïs a better predictor of eventual success than 
a battery of tests used in selection. 

There is evidence that the abilities impor- 
tant in job performance after the end of the 
learning period are frequently different from 
those of the learning period. Investigations 
of the predictive value of tests for trainability 
and job proficiency show that shifts in apti- 
tude requirements may take place. For ex- 
ample, Brown and Ghiselli (2) correlated 
pairs of validity coefficients obtained from a 
number of studies where both job proficiency 
and trainability criteria had been used in 
validating the same kinds of tests for similar 
jobs. Correlations between validity coeffi- 
cients using the two criteria were low. This 
was surprisingly true even for clerical jobs 
where verbal, numerical, perceptual, and rea- 
soning abilities could be expected to be im- 
portant during and after learning. Evidence 
of a more direct nature of the relationship of 
early performance to eventual achievement is 
found in Kornhauser’s study of billing-ma- 
chine operators for whom the shift could be 
expected to be small (4). Even so, his cor- 
relations were not high, especially for early 
parts of the learning period. 

The most desirable criterion would be pro- 
duction after completion of learning, but 
previous studies have used either time to 
reach average production (1, 5) or total of 
production during training (1). Even with 
these criteria that reflect performance during 
learning, prediction based on the early weeks 
of the learning period did not prove to be 


very accurate. Table 1 summarizes the re- 
sults of these earlier studies. 

The following hypotheses concerning the 
correlations between production at various 
stages of learning and eventual productivity 
for power sewing-machine operators were 
tested in the present study: 


1. They are low during the first weeks of 
initial learning, when the learning curve is 
rising most rapidly. 

2. They rise steadily throughout the learn- 
ing period. 

3. They are lower than those found in 
other studies in which performance during 
learning entered into the criterion figures. 

4. They are lower than those obtained for 
clerical workers, for whom the abilities re- 
quired during learning may be expected to be 
more similar to those required after training. 


Procedure 


Production data were gathered for the first 22 
weeks of employment of 18 women, previously un- 
trained in power sewing-machine operation. They 
hemmed handkerchiefs, using specialized folder at- 
tachments. All operations were nearly identical. 
Payment was by piece rate, with a guaranteed mini- 
mum, Employment was seasonal, and the workers 
were employed over almost exactly the same period 
of time. There had been almost no selection, and 
there were very few terminations prior to the end 
of the season, so that the range is unusually unre- 
stricted. All learning curves had leveled by the end 
of the eighteenth week. There was no increase in 
the average learning curve after Week 19, and in 
only one case was production for Weeks 21 or 22 
more than one per cent greater than that for Weeks 
19 or 20. In this exception, a difference of 8 per 
cent, production after Weeks 21 and 22 dropped 
back to the level of Weeks 19 and 20. The average 
curve showed that the first sharp rise was com- 
pleted by Week 5. 

Average hourly production for Weeks 19 through 
22 was taken as a criterion of performance after 
learning. Rank correlations were computed between 
this criterion and figures for each week from Week 
3, when complete production records were first avail- 


84 Patricia C. Smith and Robert A. Gold 


Table 1 


Correlations Between Production at Various Stages of Learning and Different Criteria of 
Proficiency after Learning, Previous Studies 


Correlation with Criterion of Proficiency 
after Learning 


Weeks Percent- Timeto Total Speed Accuracy 
to End age of Reach Pro- atEnd at End 
Learning of Learning Average duction of of 
Job Investi- Figure Learning Period Pro- During Learning Learning 
Studied gator Used Period Elapsed duction Training Period Period 
Preparation McGehee Average pro- 46 2% 599 
of spools (5) duction for 4% -836 
for rug single week 7% .760 
looms 9% -792 
11% 833 
13% 629 - 
15% 842 
17% 679 
Billing- Kornhauser Speed of 26* 17% .20 
- machine (4) operation 33% 48 
operators 50% 60 
67% 76 
83%, -76 
Billing- Kornhauser Accuracy of 26* 17% .00 
machine (4) operation 33% 25 
operators 50% 31 
67% 58 
83% 74 
Preparation McGehee Average of 46 2% 599 
of spools (5) cumulative 4%, 834 
for rug production 7% 842 
looms through 9% -887 
week 1% 804 
13% 929 
15% 924 
17% 923 
Power Blanken- Average of 
sewing- ship and cumulative 
machine Taylor production 
operation: (1) through 
covering week 50** 20% AL 01 
hemming 40** 25% 32 66 
trimming , 30** 33% 66 68 
* Estimated. 


_** Estimated from learning curves for both workers completing learning period and those not completing, and from reported 
difficulty of job. 5 


able, through 10, when half the learning period had Results and Discussion 


been completed and the average learning curve had 
reached 85 per cent of the terminal production rate. Table 2 shows a summary of results. The 


Average production for each two-week period and first statistically significant correlations are 
for each three-week period during the first 10 weeks for Week 5, or for composites including 
was also correlated with the criterion. ` Week 5. This was the point at which the 


Prediction of Success from Examination of Performance 85 


initial rise of the learning curve had been 
completed. The foreman who wishes to make 
his decision on the basis of a single week’s 
production, or the average of two or three 
weeks’ production, is probably justified in 
doing so after the completion of a fourth of 
the learning period, since correlations in this 
study, as in all previous studies (see Table 
1), reach a significant level by that time. 
Furthermore, the correlations at that point 
probably exceed those of most test batteries 
with eventual production. Decisions based 
on cumulative data may, in some cases, be 
warranted slightly prior to that point, but 
caution should be exercised when less than 
20 per cent of the learning period has elapsed, 
particularly if there is any contradictory evi- 
dence to suggest a delay, or if training has 
been expensive. 

The correlations in this, as in preceding 
studies, rise steadily with time, suggesting 
that there are more factors in common in the 
later portions of the learning period. 

The correlations are substantially higher 
than those for Kornhauser’s clerical workers, 
and lower than those of McGehee and of 
Blankenship and Taylor for factory workers. 
These differences are probably too great to 
be attributable to errors in estimation of 
lengths of learning periods. There is no in- 
dication of greater change in the ability pat- 
terns involved in learning motor skills than 
in acquiring clerical skills. The obtained 
hierarchy might better be accounted for by 
differences in the extent to which learning 
speed appears in the various criteria. Since 
Kornhauser’s criteria were further removed 
from the training period than ours, and ours 
are correspondingly less contaminated with 
effects of learning speed than those of the 
other two studies, it seems likely that cor- 
relations as low as Kornhauser’s might be 
obtained in the other situations after a longer 
lapse of time. 

The lack of evidence for a lower relation- 
ship between learning and job performance 
scores for motor than for clerical workers is 
in line with the studies of validity by Brown 
and Ghiselli (2), who found correlations be- 
tween validities for trainability and job pro- 
ficiencies to be equally low for clerical as for 


nt 


Table 2 


Correlations Between Production at Various Stages of 
Learning and Average Production at End of 
Learning Period, for Hemming 


Handkerchiefs 
Rank 
Percent- Correlation 
age of with 
20.5 Weeks Average 
Learning Learning at End of 
Figure Time Learning 
Used Weeks Elapsed Period 
Single week 3 15% 46 
4 20% 40 
S 24% 5s“ 
6 29% 73** 
7 34% 74** 
8 39% 78** 
9 44% 8i** 
10 49% 80** 
Average, two 3,4 20% 46 
weeks 4,5 24% 48* 
5,6 29% one 
6,7 34% sA ni 
7,8 39% 79** 
8,9 44% si** 
9, 10 49% 82** 
Average, three 3,4,5 24% 48* 
weeks 4,5,6 29% 60** 
5,6,7 34% 68** 
6,7,8 39% 78** 
7,8,9 44% 81** 
8,9,10 49% 83** 


* Rho significant at 5% level (3). 
#* Rho significant at 1% level (3). 


other kinds of jobs. Patterns of ability re- 
quired appear to shift to as great extent in 
one situation as in the other. 


Summary and Conclusions 


Production during early weeks of the learn- 
ing period was correlated with average pro- 
duction after the completion of, learning for 
a group of power sewing-machine operators. 
Four hypotheses were proposed concerning 
these correlations; three proved tenable: 


1. Correlations are low during the first few 
weeks of training, when the learning curve is 
rising most rapidly. 

2. Correlations rise steadily as the learning 
period progresses: 


86 Patricia C. Smith and Robert A. Gold 


3. Correlations are lower than those ob- 
tained in previous studies using criteria which 
reflect performance during learning rather 
than ultimate level of proficiency. 

A fourth hypothesis received no support in 
the present study. Correlations were not 
lower but, in fact, somewhat higher than those 
obtained previously for clerical workers, for 
whom it was suggested that abilities required 
during learning might be more similar to those 
required later than for the perceptual-motor 
tasks. 


Received June 24, 1955. 


References 


1. Blankenship, A. B., & Taylor, H. R. Prediction 
of vocational proficiency in three machine op- 
erations. J. appl. Psychol., 1938, 22, 518-526. 

2. Brown, C. W., & Ghiselli, E. E. The relationship 
between the predictive power of aptitude tests 
for trainability and for job proficiency. J. 
appl. Psychol., 1952, 36, 370-372. 

3. Kendall, M. G. The advanced theory of statistics. 
London: Charles Griffin, 1948. Vol. 1,'p. 401. 

4. Kornhauser, A. W. A statistical study of a group 
of specialized office workers. J. pers. Res., 
1923, 2, 103-123. 

5. McGehee, W. Cutting training waste. 
Psychol., 1948, 1, 331-340. 


Personnel 


$e 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


The Measurement of Personal Factors Related to Success of 
Office Workers * 


Mearl R. Guthrie 


Bowling Green State University 


Numerous studies have shown that the per- 
sonal traits of employees are considered to be 
a very important factor by well-informed em- 
ployers. Research reveals that undesirable 
personal characteristics are the most common 
reason for the failure of office workers to be 
promoted. Various attempts have been made 
to measure the personal traits needed for suc- 
cess on the job, but there still remains a defi- 
nite need for an easy-to-interpret technique 
that will aid in the selection of employees for 
office work on the basis of their personal 
traits. 


Problem 


The main purpose of this study was to de- 
velop a technique which would aid in the dif- 
ferentiation of employees who have desirable 
personal traits for office work from those who 
do not possess such personal traits. The sec- 
ondary purposes were to determine the va- 
lidity and reliability of the technique after it 
had been developed. 

The study was limited to female office work- 
ers whose duties did not involve supervising 
or order giving. It was further limited to of- 
fices in which five or more employees worked 
together, because most office executives agree 
that the ability to work and get along with 
other people is one of the most important 
traits an office worker can have. 

A definition of all the personal traits neces- 
sary for success in office work was not at- 
tempted. It was considered more appropriate 
to develop and validate the instrument on the 
basis of over-all personal characteristics for 
office work. 


Procedure 


Development of the items. Several types of in- 
struments and items were considered, and the opin- 
ion type survey form was chosen for the purpose at 


1A dissertation submitted to the University of 
Minnesota in partial fulfillment of the Ph.D. degree. 


87 


hand. A sample of the opinion-type statement is 
shown below: 


The supervisor is usually to blame when employees 
fail to follow directions. SA a ? d SD 


After reading such a statement and deciding how 
one feels about it, the office worker draws a circle 
around one of five possible answers—Strongly Agree, 
agree, uncertain or undecided, disagree, or Strongly 
Disagree. A total of 457 items were developed by a 
canvass of the literature and an evaluation of the 
experience of a number of people with a variety of 
backgrounds, in office work, psychology, testing, etc. 

Selection of items, It is a relatively easy task for 
an office executive or supervisor to select office work- 
ers at the extreme ends of the personal trait con- 
tinuum. The officials of 50 offices, representing a 
wide variety of business activities, were asked to 
select one or more employees who had very satis- 
factory personal traits for office work and one or 
more employees who had very unsatisfactory per- 
sonal traits for office work. Specific criteria were 
provided for each official to aid him in selecting the 
two extreme groups. 

The Survey, consisting of the 457 items, was ad- 
ministered by the writer under a standard set of 
conditions and instructions to 100 “satisfactory” and 
100 “unsatisfactory” office workers. It was hoped 
to minimize the possibility of the employee faking 
his opinions by calling the instrument a survey and 
by not requiring the employee to identify his paper. 
However, various methods were used to identify the 
papers as satisfactory or unsatisfactory after they 
were completed. 

Each item on each survey was tallied on a tabula- 
tion sheet and the totals in each of the cells com- 
puted. These totals were also percentages because 
there were 100 papers in each of the two groups. 
The following item is an example: 


“Those in charge” have much more freedom 
in the office than the employees. 


SA” S aac ig say: 
Satisfactory 12) 239 229)" to0u0 ao: 
Unsatisfactory 8629S) eer) 


The following criteria were used in the item-selec- 
tion process: 

1. Did the item adequately differentiate between 
the two groups of office workers? 

2. Were the differences in percentages consistently 
positive or negative in the Strongly Agree and agree 
categories, and in the disagree and Strongly Disagree 
categories? 


88 Mearl R. 


3. Was there at least one category in the item in 
which the difference between percentages was sig- 
nificant at the 10% level or less? 

4. Was the item clear in meaning? 

S. Did the statistic chi square substantiate the 
ability of the item to differentiate between the two 
groups? 

A total of 74 items were chosen by this process as 
outstanding in their ability to discriminate between 
the two groups. 

Scoring the items. A scoring key was developed 
by assigning weights of plus one, zero, and minus 
one to the five responses of each of the 74 items. 
These weights were determined by utilizing the dif- 
ference between percentages in each of the five cate- 
gories and the direction of the difference, as, for ex- 
ample, in the following item: 


SA a ? d sD 
Satisfactory 10 22 8 46 14 
Unsatisfactory 16 48 8 26 2 
Difference —6 —26 0 2 12 
Significance 1% 5% 1% 
Weight =f +1 1 


Table 1 shows a comparison of the scores made 
by the two groups. A significant difference between 
the means of the two groups would be expected be- 
cause these two groups were the basis for the selec- 
tion of the items. 

Selection of additional items. On the basis of the 
scores made on the 74 items, new criterion groups 
were established consisting of the upper and lower 
25% of the original 200 surveys. The responses to 
the 457 items by the two new criterion groups were 
then tabulated. The same procedure used to select 
the 74 items was used to select a total of 150 items. 
Chi square was not used in the selection of addi- 
tional items because it was found to add little, if 
any, objective information to the original item-se- 
lection process. 

The comparison in Table 2 indicates that the scor- 
ing key developed for the 150 items was satisfactory. 
It is of interest to note that not one score of the 
lower 25% group exceeded the mean of the upper 
25% group, and, vice versa. 


Table 1 


Comparison of Scores Made by the “Satisfactory” and 
“Unsatisfactory” Groups of Office 
Workers on 74 Items 


Guthrie 


Validity 

The method used in developing the items 
and the item-selection process contributed to 
the validity of the individual items. The pro- 
cedure designed to determine the validity of 
the instrument as a whole was to (a) ad- 
minister the 150 items to two unselected 
groups of office workers consisting of 73 full- 
time office workers and 54 business adminis- 
tration co-op students, (b) obtain ratings 
from several sources concerning the personal 
traits of each office worker, and (c) compute 
coefficients of correlation between scores and 
ratings. 

Ratings for each individual worker were ob- 
tained from the following three sources: (a) 
the several fellow workers in each group, (b) 
the immediate on-the-job supervisor of each 
individual, and (c) the office manager or per- 
sonnel manager of each company concerned 
for each full-time office worker and the Uni- 
versity coordinator for each co-op student. 

Ratings by fellow workers, An indirect 
method was developed by which each indi- 
vidual could be rated by the other members 
of her working group. A number of ques- 
tions similar to the one below were used. 


If I could have my lunch hour at the same time 
as one of the above persons, 


My first choice would be 

My second choice would be 
My third choice would be —-W—_____ 
My fourth choice would be 


Following completion of the Office Workers 
Opinion Survey, each individual was asked to 
answer seven questions on the Fellow Work- 
er’s Rating Scale concerning the other mem- 
bers of her group. These ratings were made 


Table 2 


Comparison of Scores Made by the Upper and Lower 
25% of the Original Groups on 150 Items 


Satisfactory Unsatisfactory 
N 100 100 
High score 34 19 
Low score —13 —36 
Range 47 55 
Mean 12.75 — 715 
SD 9.92 9.59 


Satisfactory Unsatisfactory 
N 50 50 
High score 83 — 14 
Low score — 31 —107 
Range 114 93 
Mean 23.1 — 583 
SD 29 21.1 


Personal Factors and Success of Office Workers 89 


Table 3 


Correlations Obtained Between OWOS Scores and Various Ratings on the Unselected 
Group of 73 Full-Time Office Workers 


Multiple 
Ratings 2 3 4 Correlations 
1, Office Workers Opinion Survey .29* .47** .20 -51** (OWOS and 2, 3, 4) 
2, Fellow workers’ ratings 43** a pa .49** (OWOS and 2, 3) 
3. Supervisors’ man-to-man ratings 43** sss .30* (OWOS and 2, 4) 
4. Office managers’ man-to-man rating SSe rab pe .49** (OWOS and 3, 4) 


* Statistically significant at 5% level. 
** Statistically significant at 1% level. 


anonymously, but a check was made to make 
sure that an individual did not choose her 
own name. From these ratings a numerical 
score was determined for each individual for 
purposes of comparison with the individual’s 
score on the Office Worker’s Opinion Survey 
(OWOS). 

Ratings by office managers, personnel man- 
agers, on-the-job supervisors, and the Univer- 
sity coordinator. The same types of rating 
devices were used to obtain ratings for each 
individual office worker from each of the 
above persons. 

A man-to-man rating device using a forced- 
choice technique was developed. This method 
required each rater to compare the personal 
traits of each office worker in his group with 
the personal traits of every other worker in 
the group. In this manner a numerical rat- 
ing was obtained for each person and, also, it 
was possible to check the consistency of the 
rater. 

A descriptive scale-type rating, employing 
five rather complete descriptions, was also 
used. This was not very helpful owing to the 
tendency of most raters to place all the work- 
ers in similar categories. 


Discussion 


The multiple-correlation coefficient between 
scores on the Survey and the three ratings 
was .51 (see Table 3), which is rather high 
for a validity coefficient. 

The single validating criterion showing the 
highest correlation (.47) for the full-time 
group with scores on the Survey was the 
on-the-job supervisors’ man to man rating. 
There may be several reasons for this high 


correlation, i.e., (@) supervisors may know the 
personal traits of office workers better than 
other raters, (b) the forced-choice technique 
used, and (c) supervisors may have a con- 
ception of what constitutes satisfactory per- 
sonal traits for office work which is similar to 
that measured by the Office Workers Opinion 
Survey. At any rate it is of interest that the 
raters closely associated with the office work- 
ers gave ratings which correlated highest with 
scores on the Survey. 

The low correlation between the descrip- 
tive scale ratings and the scores on the Sur- 
vey was very likely due to the type of rating 
scale used. There were five descriptive cate- 
gories in the scale and most raters were very 
reluctant to use the lowest two categories. 

The various coefficients of correlation ob- 
tained for the group of 54 co-op students were 
very similar to those obtained for the full- 
time group. 

The correlation between ratings by fellow 
office workers and scores on the Survey was 
rather low (.293) for the full-time group and 
rather high (.506) for the co-op group. This 
may have been due to the possibility that 
closer relationship existed among the co-op 
group than existed among the full-time group. 

The data presented in the foregoing tables 
concerning the validity of the Office Workers 
Opinion Survey would seem to suggest that 
the Survey may be of use in helping com- 
panies evaluate the personal traits of office 
workers. The variation of OWOS scores 
among companies indicates that there is a 
wide variation of the personal traits of work- 
ers from company to company, but if each 
company would standardize the Office Work- 


90 Mearl R. 
ers Opinion Survey on the basis of its own 
experience, the Survey could prove very help- 
ful in their employment and promotion pro- 
cedures. 

Reliability. The method for estimating the 
reliability of an instrument using differential 
weighting of item responses developed by 
Hoyt and Stunkard* was used in determin- 
ing the reliability of the Office Workers Opin- 
ion Survey. This method utilizes the follow- 
ing data: (a) the scores of each individual 
for all,items, (b) the scores of each item for 
all individuals, (c) the sum of the scores on 
all items for all individuals, (d) the sum of 
the scores of all individuals for all items, (e) 
the number of items, and (f) the number of 
individuals. 

A sample of one-half (64) of the unse- 
lected groups of 73 full-time office workers 
and 54 co-op students was obtained by cod- 
ing each of the 127 papers and using a Table 
of Random Numbers. A reliability coeffi- 
cient of .904 was obtained for the Survey, 
which is relatively high for an instrument of 
this type. 


Summary 


The purpose of this study was to investi- 
gate the feasibility of developing a technique 
which would aid in the differentiation of those 
employees who have desirable personal traits 
for office work from those who do not possess 
such personal traits. 

The original instrument consisted of 457 
opinion-type statements. These statements 
covered a variety of impressions concerning 
office work and there were five possible re- 
sponses to each statement. The 457 state- 
ments were administered to two groups of 
“satisfactory” and “unsatisfactory” office 
workers chosen from 50 offices on the basis 
of their personal traits. 

The item-selection process used to choose 
the items which differentiated between the 
two groups utilized: (a) the difference be- 
tween the percentage of responses by the two 
groups to each category of each item, (b) 
the consistency of the direction of the differ- 


2 Cyril J. Hoyt and Clayton L. Stunkard, Estima- 
tion of test reliability for unrestricted item scoring 
methods. Educ. psychol. Measmt, 1952, 12, 756- 
758. 


Guthrie 


ences between percentages in each item, (c) 
the significance of the differences between 
percentages, and (d) the statistic chi square. 
A total of 74 items were chosen as outstand- 
ing in their ability to discriminate between 
the two groups. 

A scoring key was developed for these 74 
items by assigning weights of plus one, zero, 
and minus one to the five responses of each 
item. This weighting was based on the di- 
rection and the significance of the differences 
between the percentages in each category. 
The 200 original surveys were scored on the 
basis of this scoring key and new criterion 
groups were established consisting of the 
upper and lower 25% of the original 200 
surveys. A total of 150 items were chosen 
which discriminated between these two new 
criterion groups. The item-selection process 
used was similar to the one used to select the 
74 items with the exception of chi square. 

The 150 items, called the Office Workers 
Opinion Survey, were then administered to 
two unselected groups of office workers con- 
sisting of 73 full-time office workers and 54 
business administration co-op students. Rat- 
ings concerning the personal traits of the in- 
dividuals in these two groups were obtained 
from (a) fellow workers, (b) on-the-job su- 
pervisors, and (c) office managers or the Uni- 
versity coordinator. 

Multiple correlations obtained between 
scores on the Survey and the three ratings 
were .513 for the full-time group of office 
workers and .572 for the co-op students. 
These correlations were statistically signifi- 
cant at the 1% level. A reliability coeffi- 
cient of .904 was obtained for the Survey 
using the analysis of variance method devel- 
oped by Hoyt and Stunkard. 

Conclusions. These data and statistics in- 
dicate that the Office Workers Opinion Sur- 
vey shows promise for use in evaluating the 
personal traits of office workers. It should 
be emphasized, however, that any instrument 
of this type should be standardized by a par- 
ticular firm on its own employees before in- 
dividual results are used as a basis for 
making decisions concerning an applicant or 
employee. 

Received June 21, 1955. 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


Internal Relations of Elemental Motions Within a Task 


Norman B. Hall, Jr. 


Dunlap & Associates, Inc. 


Time-and-motion economy often has as its 
foundation an atomistic concept of human be- 
havior. Revision and improvement of work 
cycles are made on the assumption that the 
individual elements which make up the total 
complex task can be removed or rearranged 
as though these elements are independent 
units. The elimination of a designated un- 
desirable element is assumed to benefit the 
total time to perform the task. In authorita- 
tive texts can be found the following state- 
ments, “Lengthy hold or unavoidable delay 
operations offer particularly good possibilities 
for improving ...” (4, p. 159), or, “The 
simo chart shows very distinctly where de- 
lays occur in the cycle and it aids in finding 
an effective way of eliminating the delays” 
(1, p. 101). 

On the other hand this atomistic concept 
has been subject to criticism. “The indi- 
vidual working at a task, however simple, 
knits all the part activities into a whole; 
therefore the changing of any part activity 
may be expected to change the pattern of the 
larger task” (3, p. 268). The failure to con- 
sider the whole task and the fact that there 
exist internal relations between the elements 
is all too frequently overlooked by time-and- 
motion analysts, notwithstanding experimen- 
tal results which demonstrate that removal of 
an element can result in an increase in the 
time of the remaining element of the cycle 
(3, p. 269). 

As an illustration of how the times for work 
elements interact, Barnes and Mundel (2) re- 
ported the times for subjects to pick up and 
insert pins in holes varying from 1⁄4 to 1 inch 
in diameter. The time in seconds to position 
a pin for insertion and to insert and remove 
a pin varied, as expected, with the size of the 
hole. There was also a variation, however, 
in the time to transport the pin from the sup- 
ply to the hole. Such variation reflects an in- 
teraction between elements. It is the purpose 
of this study to provide further evidence con- 


91 


cerning the extent of interaction of elements. 
in an actual factory operation, thereby sug- 
gesting additional information upon which to 
reject the atomistic concept of human behav- 
ior in connection with micro-motion analysis. 


Procedure 


The industrial operation selected for study is the 
fastening of numbered or lettered plastic adding- 
machine key tops to metal key stems by means of 
acetate glue and a pneumatic press. The assembly 
process is as follows: The operator sits in front of 
a foot-operated pneumatic press which is mounted 
on a table. At her right hand is a bin containing 
“stems”; these are steel stampings, all of standard 
design. At her left is a bin containing the plastic 
key tops, all uniform as to shape and symbol at any 
given time. Each key top has a recess in the under- 
side for the reception of the stem. The operator 
picks up a key top with her left hand, inspects it to 
orient the symbol, and places it upside down on the 
anvil of the press. The anvil is recessed to receive 
and hold the key top when pressure is applied. 
From the right-hand bin the operator picks up a 
stem, dips it momentarily in the acetate cup and 
positions it in the press, with the acetate-treated end 
over the key top. The operator then touches a foot 
pedal which opens a valve that admits air at 80 
pounds per square inch pressure to an air cylinder 
and the press forces the stem into the recess in the 
key top. Upon releasing pressure on the foot pedal, 
the ram retracts and a small cam on the front of 
the press fixture throws the completed assembly 
into a chute leading to a bin underneath the table. 

The machine operator chosen for this investiga- 
tion was a highly skilled woman who had been em- 
ployed on the operation for three years. Photo- 
graphic samples of the key-top assembly process 
were made at eight intervals during the day—four 
in the morning and four in the afternoon. The 
times selected for these samples were: 7:45 A.M. 
9:00 A.M., 10:00 A.M., 11:00 A.M., 12:30 P.M, 1:15 
P.M., 2:30 P.M., and 3:30 p.m. These times were se- 
lected in order to obtain values for the elemental 
motions where the greatest time changes in the com- 
posite cycle might be expected, due to warm-up, 
practice, or the effects of continued activity. The 
duration of each sample was approximately 45 sec- 
onds. Slightly over 18 feet of film were used for 
each sample. Each sample yielded an average of 15 
cycles. A frame-by-frame analysis of each sample 
was made according to the customary practice of 
micro-motion analysis (4), from which the simo- 
motion chart, Fig. 1, was constructed. 


92 Norman B. Hall, Jr, ` 


SIMO-MOTION CHART FOR KEY-TOP OPERATION 


LEFT HAND 
DESCRIPTION 


MINUTES 


TIME IN 


RIGHT HAND 
DESCRIPTION 


CYCLE A 


POSITION KEY-TOP ON ANVIL FACE |005| 
TRANSPORT EMPTY to KEY-TOP BIN [004] 


GRASP KEY-TOP 00 


TRANSPORT OF KEY-TOP TO ANVIL 
INSPECTION(Visual) TO DETERMINE THE |, 
POSITION OF THE CHARACTER 


Fic. 1. 


Results 


The internal relations which exist between 
the individual therbligs (elementary subdivi- 
sions of a cycle of motions) are shown most 
clearly by a correlation value. These correla- 
tion values were computed by pairing the 
mean time for a therblig from each individual 
sample (based on 15 values) with the mean 
time of another therblig from the same sam- 
ple. Forty such scores were available for the 
determination of each correlation value, as 
there were eight samples per day for five 
days. The number of possible pairings of 
therbligs for the right and left hand are 66 
and the correlation values for each pairing 
are shown in Table 1. 

A breakdown of these correlations is as 
follows: 


Level of 
Significance Negative Positive 
Greater than 5% 27 28 
5% 4 3 
1% 3 1 
Total 34 32 
Discussion 


The highest positive correlation (r = .57) 
is between the therbligs “position to press” 
by the right hand and “unavoidable delay” 


BALANGING DELAY UNTIL the RIGHT HAND |opal yp. 24 
REACHES the POINT for SIMULTANEOUS WORK 
ar 


PO t ION STEM IN StOT OF in ep 

A A MB MA 
POSITION STEM AND KEY-TOP TO PRESS 
RETURN TO BIN 


Simo-motion chart for key-top operation. 


by the left hand. A study of the simo-motion 
chart, Fig. 1, shows that maintenance of 
proper synchronism between the two hands 
requires the interaction of these two ele- 
ments. The extent of this relation is given 
by the correlation value. An additional inter- 
action between the left and right hand is pro- 
vided by the significant correlation of r= 
— .35 between “grasp stem” (right hand) and 
“grasp key top” (left hand). 

Statistical study of the values in the form 
of analysis of variance indicated that fluctua- 
tions in individual motions from sample to 
sample throughout the day are present and 
significant. The cycle time, however, re- 
mains constant. The constancy of produc- 
tion can be clearly observed in Table 2. 

Since the cycle time remains constant, con- 
stant production is a concomitant fact. In- 
ternal compensation must therefore occur be- 
tween the elemental motions of the cycle. 

Correlation coefficients reflect the magni- 
tude of this internal compensation and those 
which equal or exceed the 5% level of sig- 
nificance have been placed on Fig. 2. The 
most striking occurs in the negative correla- 
tion value of — .93 between the therblig 
“balancing delay until the right hand reaches 
the point for simultaneous work” and “grasp 
key top.” This clearly shows that if time is 
not consumed in the “balancing delay” ther- 
blig, it is in the “grasp of the key top.” 


‘ Internal. Relations of Elemental Motions 


93 


Table 1 


| Correlations Between Individual Therbligs 
(N=40. See Fig. 1 for an explanation of the symbols used here.) 


| xe 
| 2 
g 
S 
SSS, 
GRASP-STEM IN ak aa 
E Re o p 
TL- STEM O & 
Q o é 
4a Ca 
POS. TO CUP j+o7 . $ k i) 
aj o xd 

TL TO PRESS oF E A 

3 S TR 
POS. & ASS. Fd & S 

o 
POS. TO PRESS |..04|-. sO x 
g & 
TE TOSTEM BIN 02.18 l.e Ie eles & / oF i 
F 

GRASP-KEY TOP |-.35 |-.04|-.o1 |-.07 ARAN Es 


UNV, DELAY 
es Key Or 


AG 


L een fel-die eee 


+.43|-.03 


TE-KEY TOP BIN-.07 |+.08|-.03 


* SIGNIFICANT AT THE 1% LEVEL. 
t SIGNIFICANT AT THE 5% LEVEL. 


A further consideration of the values which 
are significant at the 5% level indicates that 
all correlations for the left hand are negative, 
while those of the right hand are both posi- 
tive and negative. This is to be expected in 
the light of the following consideration: If 
all correlation values were positive, produc- 
tion would tend to vary with the therblig 
times. To have all values negative would be 
impossible. The combination permits syn- 
chronism and internal compensation. It is 
important to note that these correlations have 

-© meaning only if the cycle times are constant. 
Significant correlations like those in Table 1 


ae 
+ ae 
122-05 l-o oslos a8 | \ 


$ 
SU 


©: 
& 


us ear 


+.06 t.09 


HE egee 


could be obtained with no correlation be- 
tween therbligs measured at any one time of 
day, but with significant differences between 
measurements of the same therbligs made at 
different times of day if the cycle time varied. 
In our case, however, such variation in time 
cycles during the day did not occur, as is re- 
flected by Fig. 2. 

Because the interrelation of the elemental 
motions of this and similar tasks is not ade- 
quately represented in the conventional form 
of simo-motion charts, it was considered valu- 
able to construct a three-dimensional simo- 
motion chart (see Fig. 2). The elemental 


94 


motions of the right hand are placed on the 
upper circle, those of the left hand are on the 
lower circle. In both cases the sequence of 
elements is in a clockwise direction. The two 
circles are so placed that the corresponding 
movements of the hands are related as in the 
conventional simo-motion chart. Times for 
the elemental motions are represented in the 
three-dimensional simo-motion chart as pro- 
portional parts of the circle; total cycle time 
equals the full circle. Only correlation values 
which equal or exceed the 5% level of sig- 
nificance appear in Fig. 2. These correlation 
values demonstrate clearly the internal rela- 
tion between each therblig. 

It is very probable that if this three-dimen- 
sional procedure is used with other tasks, it 
may well throw further light on the synchro- 
nism and interaction of complex effort. Pos- 
sibly in a more extensive study, the changes 
in these correlations from one time of day to 
another would reflect increasing compensatory 
synchronism related to fatigue. The absolute 
size of the average intercorrelation might well 
serve to indicate the degree of internal com- 


Table 2 


Production Record for Key-Top Assembly Operation 
(Average pieces produced per minute for half-hour intervals throughout the working day for each day of the wer 


Norman B. Hall, Jr. 


a 
x S; e 
= x hie * 
pensation for typical over-all variations in 
the therbligs. 


Summary and Conclusions 


Micro-motion samples of an industrial op- 
eration were taken eight times during the 
work day, Monday through Friday. These 
samples were analyzed by the customary 
micro-motion analysis technique. Each sam- 
ple provided approximately 15 cycles (i.e. 
an individual element would occur 45 times 
in the cycle), and the average value for each 
individual element within a sample was de- 
termined. For the purpose of obtaining a 
correlation value, average times of individual 
elements within sample were paired. Forty 
such values were obtained and the relation of 
each elemental motion to every other was 
made, 66 values in all. Eleven of the cor- 
relations thus obtained equaled or exceeded 
the 5% level of significance. 

The internal relations which exist between 
the elements of a manual operation are sho 
by these correlations between the time values 


r 
Time Monday Tuesday Wednesday Thursday Friday Average M.V. 
7:30-8:00 22.7 19.0 20.3 21:2 21.1 20.9 97 
8:00-8:30 23.9 26.6 19.3 25.1 22.2 23.4 2.14 
8:30-9:00 23.8 23.9 ` 19.4 23.6 23.4 22.8 1.37 
9:00-9:15 23.9 24.8 21.1 24.1 25.1 23.8 1.1 
A.M. Rest Period 
9:25- 9:30 20.5 21.8 21.1 23.3 23.0 21.9 .96 
9:30-10:00 21.5 23.0 20.8 23.3 22.2 22.2 81 
10:00-10:30 24.1 22.4 21.6 20.9 22.4 22.3 .82 
10:30-11:00 23.8 24.7 21.5 24.2 23.4 '23.5 86 
11:00-11:24 24.5 23.0 23.0 21.1 25.0 23.3 1.14 
Lunch 
11:54-12:30 22.6 22.7 21.2 20.3 21.7 21.7 76 
12:30- 1:00 21.7 21.8 21.7 21.7 20.9 21.6 .26 
1:00- 1:30 26.6 23.4 21.7 23.0 24.2 23.8 1.30 
1:30- 1:45 22.6 24.3 23.9 22.4 22.4 23.1 78 
p.M, Rest Period 
1:55-2:00 21.6 23.7 21.9 20.9 20.0 21.6 94 
2:00-2:30 21.6 23.4 21.9 22.8 22.7 22.5 58 
2:30-3:00 22.2 21.0 23.5 22.0 20.7 21.9 82 
3:00-3:30 20.5 22.8 22.9 21.8 20.3 21.7 1.00 
3:30-4:00 20.5 23.9 22.7 21.8 20.3 21.8 1.16 


TRANSPORT 
STEM TO. 
AGETATE 
CUP 


STEM IN SLOT 
OF KEY TOP 


/ 
LURANSPOR TI 
STEM TO KI 
ITOP ON ANVIL” ™ 
i 


TRANSPORT 
OF KEY 
TOP TO 
ANVIL 


POSITION 
KEY TOP 
ON ANVIL 


Three dimensional simo-motion chart. 


Fic, 2. 


Therbligs for the right hand are on the upper cir- 
cle, therbligs for the left hand are on the lower cir- 
cle. The sequences of motions of each hand are 
read in a clockwise direction. The numbers shown 
are correlations; all equal or exceed the 5% level of 
significance. 


for pairs of therbligs. If production is to re- 
main constant, variations in time of the in- 


» 
Intertal Relations of Elemental Motions 


95 


dividual elements must be and are compen- 
sated for by either an increase or decrease in 
the time of the other elements which com- 
pose the cycle. This compensation may oc- 
cur among the elements of one hand or be- 
tween elements of both hands. 

A three-dimensional simo-motion chart was 
used as a new method to represent the in- 
ternal relations and to demonstrate that the 
industrial operation functions as, and must be 
viewed as, an integrated whole and not as a 
simple summation of independent units. 

The results of this experiment provide evi- 
dence that the elements are related. The re- 
moval of one because of its undesirability may 
well be reflected in other elements. An 
atomistic concept of human behavior appears 
to ignore too much when considering motion 
economy. 


Received May 13, 1955. 


References 


1. Barnes, R. M. Motion and time study, (2nd 
Ed.) New York: Wiley, 1940. 

2. Barnes, R. M., & Mundel, M. E. Studies of hand 
motions and rhythm appearing in factory 
work. Univer. Ia. Stud. Engng, 1938, No. 12 
(New ser. No. 350). 

3. Ghiselli, E. E., & Brown, C. W. Personnel and 
industrial psychology. New York: McGraw- 
Hill, 1948. 

4. Lowry, S. M, Maynard, H. B., & Stegemerten, 
G. J. Time and motion study and formulas 
for wage incentive, (3rd Ed.) New York: 
McGraw-Hill, 1940. 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


B 


+ # 


Factorial Analysis of Complex Psychomotor Performance 
and Related Skills 


Edwin A. Fleishman and Walter E. Hempel, Jr. 


Skill Components Research Laboratory 
Air Force Personnel and Training Research Center» 


Relative to research in other aptitude areas, 
little is known about the organization of abili- 
ties in the aptitude area of perceptual-motor 
skill. This is understandable in view of the 
technical and administrative difficulties in- 
volved in constructing and assembling the 
batteries of apparatus tests required for such 
research. 

Even in the wartime Air Force classification 
research program, which involved the most 
extensive program of psychomotor test de- 
velopment ever attempted (6), no more than 
six psychomotor tests were ever included to- 
gether in any of the experimental test bat- 
teries subjected to factor-analysis study. 
These analyses and subsequent postwar Air 
Force analyses (3, 4, 10, 11) have been di- 
rected primarily at investigations of other 
aptitude areas (e.g, spatial visualization, 
perception, memory, integration, reasoning). 
In those analyses in which more than one 
psychomotor test appeared, a factor com- 
mon only to these psychomotor tests and dis- 
tinct from the printed tests was consistently 
demonstrated and labeled “Psychomotor Co- 
ordination.” There was no opportunity, 
however, to construct experimental factor- 
analysis batteries, including a wider range of 
psychomotor tests, aimed at (a) splitting up 
this general psychomotor factor into possible 
more basic components, (b) clarifying the 
Status of certain secondary psychomotor fac- 
tors found on occasion in these studies, and/or 
(c) discovering additional psychomotor fac- 
tors. 

A series of such experimental psychomotor 
test batteries is currently being constructed 

1 This research was carried out under the Air 
Force Personnel and Training Research Center, Lack- 
land Air Force Base, San Antonio, Texas, in support 
of Project 7703, Permission is granted for repro- 
duction, translation, publication, use, and disposal in 
whole or in part by or for the United States Gov- 


ernment. The data of this study have been reported 
in AFPTRC Research Bulletin 54-12. 


96 


and subjected to factorial study in this labo- 
ratory (see, e.g., 1, 2,5). Concurrently with 
these studies, however, factor analyses of cer- 
tain relevant correlational matrices which exist 
from previous research would appear to pro- 
vide additional insight and possible leads 
about the organization of abilities in this 
aptitude area. 
Problem 


The present paper, which represents an- 
other extension of the Air Force studies, is 
based on data previously published through 
the USAF School of Aviation Medicine, and 
originally collected in 1947—48 in a coopera- 
tive Air Force-Navy research projectyat Pen- 
sacola, Florida. In this project a wide va- 
riety of Air Force printed and apparatus tests 
were assembled and administered to over 
1,000 Navy pilot candidates at the Naval Air 
Station. The test procedures and validities, re- 
liabilities, and intercorrelations obtained have 
been described in detail elsewhere (7, 8, 9). 
The present paper describes a factor analysis 
of the intercorrelations among certain vari- 
ables selected from the correlation matrix pub- 
lished by Payne (9); the analysis included 
23 test variables of which 16 were apparatus 
psychomotor tests and 7 were printed tests 
designed as possible substitutes for the ap- 
paratus tests. Among the apparatus tests se- 
lected were the four tests that have consist- 
ently defined the “Psychomotor Coordination” 
factor in previous Air Force analyses. Also 
included were tests which had occasionally 
represented a factor tentatively labeled “Psy- 
chomotor Precision.” It was also felt that 
the analysis would throw additional light on 
the role of spatial orientation and visualiza- _ 
tion abilities in psychomotor performance. 
In addition, it was possible to include the 
criterion of pilot success as one of the vari- 
ables, since the validities of all the tests also 
had been determined on this sample. The 


+ 


Factorial Analysis of Psychomotor Performance and Related Skills 97 


criterion represents graduation or elimination 
from pilot training for reasons of flying de- 
ficiency. The inclusion of the criterion in the 
analysis was aimed at identifying the factor 
structure of the criterion, especially with re- 
gard to psychomotor skills. 

It should be pointed out that after the 
completion of the present analysis and factor 
interpretations, the writers learned of a com- 
pletely independent series of analyses of some 
of these same data by Roff (12). The pres- 
ent analysis, however, differs from Roff’s 
analyses in several ways. Roff used different 
combinations of variables, favored an oblique 
reference frame, and did not include the pilot 
criterion in his factor analyses. In spite of 
these procedural differences, and some differ- 
ences in factor interpretations, the two analy- 
ses showed some agreement on certain of the 
common factors identified. In those cases 
where factors identified in the two studies 
appear identical or highly similar, this will be 
pointed,out in the text. One additional diffi- 
culty in making such comparisons is the fact 
that Roff sometimes included multiple scores 
from the same test. This often allowed the 
appearance of “doublet”-type factors specific 
to individual tests. The present analysis was 
confined to single scores from each test. 


The Test Variables 


Brief descriptions of the 23 selected test 
variables follow. More detailed descriptions 
of the apparatus tests may be found else- 
where (6). Descriptions of the printed tests 
may be found in (9). 


Apparatus Tests: (code numbers refer to Air 
Force designations [6]) 


1. Plane Control (CM817B). The attitude of a 
model airplane is varied irregularly in its roll, pitch, 
and yaw axes by a motor-driven cam system. The 
examinee attempts to keep the airplane in a straight- 
and-level attitude by making compensatory adjust- 
ments of stick and pedal controls. 

2. Multidimensional Pursuit (CM813E). Pointers 
on each of four meters (representing airspeed, r.p.m., 
bank, and turn indicators) vary irregularly, continu- 
ously, and independently of one another. The ex- 
aminee attempts to keep all the pointers in the center 
of their respective scales by compensatory adjust- 
ments of simulated stick, rudder, and throttle con- 
trols, 


4 


3. Rate Control (CM825A). A target line moves 
back and forth across a: curved scale, with frequent 
changes in direction and rate of movement. The 
examinee attempts to keep a pointer in coincidence 
with this line by adjustive manipulations of a knob 
control. 

4. Rotary Pursuit (CP410B), The examinee at- 
tempts to keep a prod stylus in contact with a small 
metallic target, set in a rapidly revolving phono- 
graph-type disk. 

5. Two-Hand Coordination (CM101B). The ex- 
aminee attempts to keep a target follower on a 
small target disk as the target moves irregularly and 
at varying rates. Movement of the target follower 
in the desired direction is controlled by the simul- 
taneous rotation of two lathe-type handles. 

6. Complex Coordination (CM701E). Patterns of 
lights are presented whose positions are to be matched 
by appropriate adjustments of stick and rudder con- 
trols. A correct response is accomplished only when 
both the hands and feet have completed and main- 
tained the appropriate adjustments, at which point a 
new pattern of lights to be matched is presented. 

7, Rudder Control (CM120C). The examinee sits 
in a mock airplane cockpit, which he attempts to 
keep lined up steadily with one of three target lights, 
as they come on in front of him, His own weight 
throws the seat off balance unless he applies and 
maintains proper correction by means of foot pedals. 

8. Direction Control (CP650A). Patterns of lights 
appear in the form of a cross (to indicate the “head- 
ing” of an airplane). The pattern may appear in 
different parts of the display panel and may repre- 
sent a small or large cross. Depending on the direc- 
tion of “heading,” the examinee must use both hands 
concurrently in manipulating the proper combina- 
tion of switches or switches and buttons. 

9. Single Dimension Pursuitmeter (CM801B6). The 
examinee makes compensatory adjustments to keep a 
horizontal line in a null position as it deviates from 
center in irregular fashion, Adjustments are made 
by in-and-out movements of a control wheel, damp- 
ened pneumatically to simulate the latency of air- 
craft controls, 

10. Compensatory Balance (CM510A). The ex- 
aminee controls the attitude of a platform contain- 
ing a tortuous alley maze so that a gravity-activated 
ball will roll through the correct pathways (at 10 
choice points) and not into blind alleys. Forward 
and backward tilt is controlled manually, side-to-side 
tilt by foot pedals. 

11. Controls Orientation (CP638A). The examinee 
is confronted with an upright panel containing a 
circular arrangement of 16 lamps and a horizontal 
panel containing a corresponding arrangement of 16 
toggle switches. For each successive problem he is 
required to push a switch that is the same number 
of switches distant from a variable reference point 
on the switch panel that a key lamp is from a 
stationary reference point on the Jamp panel. The 
problems are further complicated in that the key 


98 


lamp is never lighted. Instead, its position is al- 
ways midway between two lighted lamps. 

12. Pursuit Confusion (CM702B). The examinee 
attempts to keep a stylus on a variable-speed target 
as it moves through a diamond-shaped slot. The 
task is complicated by the fact that the entire target 
area is visible only by mirror vision. 

13. Drift Correction (paced). The display con- 
sists of two concentric circles of eight signal lamps 
each, For each stimulus setting, one light in the 
inner cirele (representing plane headings) and one in 
the outer circle (representing wind direction) are 
lighted. The examinee pushes a toggle switch to the 
left or right to indicate the appropriate correction in 
heading. 

14. Discrimination Reaction Time (CP611E2). The 
examinee manipulates one of four toggle switches as 
quickly as possible in response to a series of visual 


Edwin A. Fleishman and Walter E. Hempel, Jr. 


stimulus patterns differing from one another with re- 
spect to the spatial arrangement of their component 
parts (e.g., position of a lighted red lamp relative to 
a lighted green lamp). 

15. Complex Multiple Reaction (CP512AX1). 
Visual and auditory stimuli alternate with one an- 
other as cues for either manual or pedal responses, 
respectively. The particular color or configuration of 
the visual cue determine which of three levers should 
be manipulated. The relative pitch of the auditory 
cue determines which of two pedals should’ be de- 
pressed. 

16. Santa Ana Finger Dexterity (CM116A). This 
test involves a peg board containing 48 pegs having 
square bottom piéces and round tops. The examinee 
removes each peg in turn, rotates it clockwise through 
180°, and reinserts it in its square hole with his pre- 
ferred hand. 


Table 1 
Rotated Factor Loadings * 


Factors** 
I I TL kv: V VI VIE VIH IX X 
Variable PC-I PC-II SR-I SR-II In RC P “MD: Vz. Res h? 
1. Plane Control 25 50 00 07 15 15 02 24 09 05 43 
2. Multidimensional Pursuit 20 38 18 23 36 05 07 24 07 —01 47 
3. Rate Control 24 31 00 11 03 58 09 01 11 —08 53 
4. Rotary Pursuit 47 08 13 17 05 24 16 35 —06 —03 49 
5. Two-Hand Coordination 46 36 13 09 34 17 —04 10 04 15 55 
6. Complex Coordination 50 37 16 22 36 10 05 10 21 —13 67 
7. Rudder Control 40 56 11 —10 05 19 03 12 —05 —02 55 
8. Direction Control 15 17 39 27 02 —06 02 21 44 09 53 
9, Single Dimension Pursuitmeter 08 27 —06 1604 55 07 05 07 06 43 
10. Conipensatory Balance 32 08 08 08 16539 11 31 —10-—02 42 
11. Controls Orientation 05 18 46 14 20 00 1B 24 2 286) 13 54 
12. Pursuit Confusion 48 06 19 19 05 17 06 04 11 15 38 
13. Drift Correction 03 20 40 15 21 08 02 15 00 13 31 
14. Discrimination Reaction Time LOA 7S S50 17 14 08 34 10 —06 61 
15. Complex Multiple Reaction 10 18 OF 41 06 -—02 05 19 18 13 31 
16. Santa Ana Finger Dexterity 11. 403-07 726 19 09 20° 38 12 11 34 
17. Coordination (printed) 14 —01 03 00 09 03 42 24 00 02 26 
18. Discrimination Pursuit (printed) 07 01 03 23 21 01 55S 08 14 —02 43 
19. Signal Discrimination (printed) 01 05 35 52 34 —07 40 10 07 —01 69 
20. Complex Movements (printed) 06 03 13 09 #40 O1 0 07 32 16 32 
21. Coordinate Movements (printed) 10 09 36 18 44 00 19 02 11 06 43 
22. Directional Control (printed) 10 —10 38 03 28 18 10 02 34 —11 41 
23. Discrimination Reaction Time 
(printed) fiat 2023) Ale 95 106s 2 35u 02. 0800109 45 
24. Criterion 64 23 22 02 —02 —04 07 08 07 —03 53 
a*/k 08 06 06 06 05 04 04 04 03 01 


* Decimal points are omitted. 


** Factors are identified as follows: I—Psychomotor Coordination I; II—Psychomotor Coordinati ; HI—Spatial Rela- 
tions I; IV—Spatial Relations II; V—Integration; VI—Rate Control; VII—Perceptual Speed Wilk Mamial Desens K 


Visualization; X—Residual. 


Factorial Analysis of Psychomotor Performance and Related Skills 99 


Printed Tests: (code numbers are Navy desig- 
nations [9]) 


17. Coordination Test (NavMed 1215). The path- 
ways on a series, of superimposed mazes must be 
traced accurately and rapidly. 

18. Discrimination Pursuit (NavMed 1216). 
Twenty-five rows of 5 circles each are presented and 
these 5 circles are of varying diameters. Different 
problems are presented in which circles of certain 
diameters in certain columns are to be connected 
with certain other circles as rapidly and accurately 
as possible. 

19. Signal Discrimination (NavMed 1212). Each 
of a series of items presents 4 circles. One circle 
contains a zero, another a plus mark, and the re- 
maining two are empty. The task is to determine if 
the plus mark is right, left, above, below, or diagonal 
to the zero. 

20. Complex Movements (NavMed 1214). Each 
problem presents a vertical line and above it a hori- 
zontal line. On each of these lines is a circle and a 
dot, Below the lines is a grid of 49 squares, one of 
which contains an “S.” The examinee indicates in 
the grid, from the letter “S,” the direction required 
to “move” the dots on both lines simultaneously 
into their respective circles. 

21. Coordinate Movements (NavMed 1211). Each 
problem presents a series of arrows which vary in 
curvature and represent different directions of move- 
ment, The thickness of each arrow represents dis- 
tance of movement. To the left of these arrows are 
a dot and a circle. The examinee selects the one 
symbol that will move the dot inside the circle. 

22. Directional Control (NavMed 1213). Each 
problem presents nine boxes side by side, with a 
target circle above the middle box. Also pictured is 
an arrow whose head indicates direction and whose 
length indicates distance of movement. The ex- 
aminee indicates from which box it must be launched 
to have its head reach the target. 

23. Printed Discrimination Reaction Time 
(CP634A). Each item simulates a setting of the 
apparatus test of the same name (see Test 14 above), 
except responses are made by pencil in one of four 
slots on answer sheets. The slots are arranged in the 
same up-down, left-right pattern as are the toggle 
switches on the apparatus version. 


The Criterion 


24, Graduation versus elimination of Navy mid- 
shipmen from flying training, where eliminations 
were due to flying deficiency. 


Results 


Ten factors were extracted from the inter- 
correlations among these variables by the 
Thurstone centroid method (13).? The axes 


2The correlation matrix and the matrix of cen- 
troid factor loadings have been deposited with the 


defined by these centroid factor loadings were 
then rotated orthogonally to simple structure 
and positive manifold, using Zimmerman’s 
graphical procedure (14). Table 1 presents 
the complete matrix of rotated factor loadings. 


Interpretation of the Factors 


The rotated factors were interpreted for 
psychological meaningfulness. Loadings above 
.20 were considered significant. For pres- 
entation, only loadings above .30 for test 
variables and those above .20 for the pilot 
criterion are listed in turn for each factor. 

Factor I is identified as Psychomotor Co- 
ordination—I. 


Variable 
No. Loading 
6 Complex Coordination -50. 
12 Pursuit Confusion 48 
4 Rotary Pursuit 47 
5 Two-Hand Coordination 46 
7 Rudder Control 40 
10 Compensatory Balance +32 
24 Criterion 64 


Previous Air Force studies have consist- 
ently identified a factor common to four of 
the psychomotor tests which have been op- 
erational (Variables 4, 5, 6, 7). This factor 
was called Psychomotor Coordination and de- 
fined broadly as representing the coordina- 
tion of the larger muscles of the body in 
movements of moderate scope. The present 
battery represents the first time the four tests 
defining this factor have been administered in 
combination with a large number of other 
psychomotor tests. As a consequence, it ap- 
pears that this Psychomotor Coordination 
factor has been split into several other fac- 
tors. Another possibility is that we have sim- 
ply uncovered additional psychomotor factors 
present in these tests. The distinguishing 
characteristic of the present factor appears to 
be that fine, sensitive, highly-controlled ad- 
justments are required in movements quite re- 
stricted in scope. For example, in certain 
tasks movement of controls into approxi- 
American Documentation Institute. Order Docu- 
ment No. 4726 from the ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to Chief, Photoduplication Service, 
Library of Congress. 


100 


mately the proper position is first necessary, 
but then more delicate adjustment of the con- 
trols may be necessary to locate the exact 
target position and to maintain it. Such very 
fine controlled muscular adjustments seem 
crucial at some stage of performance in all 
the tasks loaded on this factor. For the pres- 
ent, then, this factor is interpreted as repre- 
senting the ability to control muscular move- 
ments involved in making fine, accurate 
adjustments. It can also be seen that this 
factor contributes a large proportion of the 
variance in the pilot criterion. 

Factor II is identified as Psychomotor Co- 
ordination—II. 


Variable 
No. Loading 
7 Rudder Control 56 
1 Plane Control 50 
2 + Multidimensional Pursuit 38 
6 Complex Coordination 37 
5 Two-Hand Coordination 36 
3 Rate Control 31 
24 Criterion .23 


This factor appears to involve coordination 
between muscle groups in making more gross 
adjustments, where the use of more than one 
body member is required. The tests most 
heavily loaded on this factor all emphasize 
these more extensive coordinate adjustments. 
In fact, the first four tests on this factor in- 
volve coordinate leg movements. Three of 

_ these four also involve arm movements, and 
the fifth test involves coordinate movements 
of both arms only. It is true that several of 
these tests involve both Psychomotor Coordi- 
nation—I (fine adjustment) and Psychomotor 
Coordination—II. However, the present dis- 
tinction between these two factors rests pri- 
marily on the presence of Rotary Pursuit and 
Pursuit Confusion in Psychomotor Coordina- 
tion—I and their absence from the present 
factor. In both these tests skill around the 
target area is the important feature, no ad- 
justment is required from one target to an- 
other, and only one body member is involved. 
It can also be seen that the Psychomotor Co- 
ordination—II factor contributes in some de- 
gree to the pilot criterion. 

Factor III is identified as Spatial Relations 
I (stimulus interpretation). The precise defi- 
nition of the factor of Spatial Relations has 
been the subject of considerable study (e.g., 


Edwin A. Fleishman and Walter E. Hempel, Jr. 


3, 4, 15). It has been defined broadly by 
Guilford (3) as the ability to relate different 
stimuli to different responses, where either 
stimuli or responses are arranged in spatial 
order. However, it has not been clear whether 
the appreciation of spatial arrangement of 
stimuli or of responses separately is the key 
to the factor. Several more specific hypothe- 
ses have been proposed (e.g., 3, 15). Accord- 
ing to one hypothesis, it is the ability to make 
discriminations as to directions of motion 
(e.g., up, down, left, and right). Other hy- 
potheses suggest the essence of the space fac- 
tor could be (a) ability to perceive visual- 
spatial relationships, (b) ability to organize 
movements in spatial-determined order, or 
(c) ability to relate specific spatial locus or 
arrangement within the stimulus pattern with 
specific locus or arrangement within the re- 
sponse pattern. 

The present analysis contributes certain 
leads toward resolving these questions. The 
presence of a wider variety of printed and 
psychomotor tasks, with obvious spatial char- 
acteristics, has apparently allowed a separa- 
tion into at least two spatial factors. Each 
of these factors is consistent with more recent 
emphases (4, 15) that spatial relations in- 
volves orientation with respect to one’s own 
body. 

The first of these factors is defined by the 
following variables: 


Variable 
No. Loading 
11 Controls Orientation 46 
13 Drift Correction 40 
8 Direction Control 39 
22 Directional Control (printed) 38 
14 Discrimination Reaction Time ST 


21 Coordinate Movements (printed) .36 
19 Signal Discrimination (printed) 35 
24 Criterion 22. 


The tests most heavily saturated with this 
factor involve the interpretation of the spatial 
relations of the stimuli before the proper re- 
sponse can be determined. Moreover, load- 
ings appear to be largest in the tests present- 
ing the most complex stimulus patterns, in 
which interpretation of the proper spatial re- 
lationships of the components of the pattern 
are the most difficult parts of the task. Ori- 
entation, however, is still with respect to one’s 
own body, since in all the tasks there is a 


Factorial Analysis of Psychomotor Performance and Related Skills 


logical relationship between the responses re- 
quired and the spatial characteristics of the 
stimulus. The emphasis, however, appears to 
be on the ability to perceive and interpret 
these visual-spatial characteristics of the 
stimulus. Roff (12) identified a similar fac- 
tor in which the element of skill in respond- 
ing was apparently subordinate to the skill in 
differentiating the stimulus. Roff, however, 
does not refer to the spatial characteristics of 
the task and prefers to call the factor simply 
Visuo-Motor Discrimination. 

This is the spatial factor which contributes 
to the pilot criterion. 

Factor IV is identified as Spatial Relations 
IL (response orientation or choice). 


Variable 
No. Loading 
19 Signal Discrimination (printed) 52 
14 Discrimination Reaction Time 50 
23 Discrimination Reaction Time AL 
(printed) 
15 Complex Multiple Reaction Al 


It is involved in tasks which have a number 
of possible responses which are spatially dis- 
tinct. The important distinction between this 
factor and Spatial Relations I is that the pat- 
terns of stimuli need not be spatially arranged 
and need not be difficult to discriminate. The 
crucial parts of the tasks appear to be in de- 
cision and choice of response. This factor ap- 
pears to represent the abilities to make rapid 
discriminations as to directions of motion. 
Even the printed tests on this factor involve 
the making of an appropriate response (e.g., 
checking slots arranged in an up, down, right, 
left pattern, or drawing a series of lines in the 
correct directions). The Complex Multiple 
Reaction Test is the purest measure of this 
factor, and here the stimuli are not spatially 
arranged (discriminations are to be made pri- 
marily between colors of light and/or be- 
tween visual versus auditory cues). The re- 
sponses required are quite complex, however 
(one of three levers, or right or left foot 
pedal). Even the Finger Dexterity Test (with 
a loading of .26) requires a decision as to 
counterclockwise or clockwise movement of 


pegs.® 
3 Possible disturbing elements in our discussion of 


these Spatial Relations Factors are the loadings of 
the Complex Coordination Test, which have gener- 


101 


The present analysis does not allow a cru- 
cial test of the distinction between these two 
spatial factors since many of the tasks involve 
spatial arrangements of both stimuli and re- 
sponses, but the present results do suggest the 
kind of experimentation required to substan- 
tiate or reject the present hypothesis. This 
would require an analysis of a battery of tests, 
some of which involve simple nonspatial re- 
sponses and complex spatial stimulus charac- 
teristics, while others require complex choices 
of responses arranged spatially, but involve 
no spatial interpretations of stimuli (e.g., The 
Complex Multiple Reaction Test). 

Factor V is tentatively identified as Inte- 
gration. It appears in both printed and ap- 
paratus tests. 

Variable 
No. Loading 


21 Coordinate Movements (printed) 44 
20 Complex Movements (printed) 40 


2 Multidimensional Pursuit 36 
6 Complex Coordination 36 
5 Two-Hand Coordination 34 


19 Signal Discrimination (printed) -34 


The apparatus tests with highest loadings on 
this factor all require simultaneous responses 
of more than one body member. Successful 
performance in each of these tasks depends 
on the proper coordination or merging of 
movements of both hands or hands and feet. 
For example, in the Two-Hand Coordination 
Test, movements of one hand alone move the 
target follower to the right or to the left, 
while movements of the other hand alone con- 
trol the target follower toward or away from 
the subject. Proper combinations of move- 
ments together, however, can move the target 
follower in any resultant direction. Similar 
requirements are imposed in the other ap- 
paratus tasks. 

The presence of certain printed tests on 
this factor suggest further hypotheses about 
its nature. For example, in the Coordinate 
Movements Test, one type of arrow repre- 
sents left-right movement, another type rep- 
resents up-down movement, and still another 
type represents no movement. Thickness of 
arrow represents distance of movement. The 
examinee must select from five combinations 
ally appeared on a Spatial Relations factor. This 


test is not loaded on Spatial Relations I and has a 
loading on only .22 on Spatial Relations IE 


102 


of these symbols the one symbol which will 
give the proper resultant response (e.g., the 
arrow which will “move” a dot exactly inside 
a circle located in a certain direction and at 
a certain distance from it). In the Complex 
Movements Test, the examinee must choose 
the one response which will produce the proper 
extent and direction of “movement” in two 
separate dots. Any of the wrong choices will 
leave one or both of the dots short of the 
target circle. In either of these two printed 
tests the examinee must utilize and apply the 
information provided by a number of separate 
symbols into an integrated single response. 
It therefore appears that this factor in- 
volves the ability to utilize and coordinate 
a number of disparate cues and activities 
quickly and accurately in order to produce an 
appropriate integrated response.* Roff (12) 
also identified a factor containing these same 
apparatus tests, which he called Integrated 
Coordination. However, these printed tests 
did not appear on this factor in his analysis. 
It appears that Roff’s factor is the same as 
our Psychomotor Coordination II and that 
our Integration factor was not identified in 
his analysis. 
Factor VI is defined as Rate Control. 


Variable 
No. Loading 
3 Rate Control 58 
9  Single-Dimension Pursuitmeter A 
10 Compensatory Balance 39 


This factor is common only to certain of the 
apparatus tests, and to none of the printed 
tests. It appears to involve an aspect of pur- 
suit. Each of the tasks requires the ex- 
aminee to make anticipatory adjustments rela- 
tive to changes in speed and direction of a 
continuously moving object. For example, in 
the Rate Control Test, the examinee must 
keep a pointer on a target line as the line 
moves across a scale with frequent changes in 
rate of acceleration and direction. In the 
Single Dimension Pursuit task, a horizontal 
line must be kept in a null position as it 


4 The relationship between this factor and the In- 
tegration factors identified in the Air Force analyses 
(3) is in doubt. The Air Force analyses failed to 
find an Integration factor in the psychomotor tests. 
However, the present integration factor seems of a 
different type from those identified previously. 


Edwin A. Fleishman and Walter E. Hempel, Jr. 


deviates in an unpredictable manner as to 
direction and rate of movement. The nature 
of the Compensatory Balance Test may seem 
ostensibly to differ from the other tests on 
this factor. It can be seen, however, that 
judging and compensating for the rate of 
movement of the steel ball as it approaches 
the various choice points in the maze pattern 
is crucial in keeping the ball in the correct 
pathway. An interesting feature of this factor 
is that it cuts across the traditional categoriza- 
tion of pursuit tasks into “following pursuit” 
(e.g., Rate Control) > and “compensatory pur- 
suit” (e.g., Single Dimension Pursuit) tasks. 
These results suggest that from the point of 
view of individual differences, this distinction 
may be arbitrary, and the nature of such 
tasks may better be thought of in terms of a 
third underlying variable. This is confirmed 
by Roff’s analysis (12) which found a factor 
defined only by the Rate Control and Single 
Dimension Pursuitmeter tests. He defined 
his factor as Eccentric Pursuit, but the pres- 
ent analysis suggests the nature of the factor 
may be broader than this. 

Factor VII is identified as the well-estab- 
lished factor of Perceptual Speed. 


Variable 
No. Loading 
18 Discrimination Pursuit (printed) .55 
17 Coordination (printed) 42 
19 Signal Discrimination (printed) 40 
23 Discrimination Reaction Time 35 


(printed) 


This factor involves facility in making rapid 
comparisons of visual forms and the notation 
of similarities and differences in form and de- 
tail. The nature of the tests with highest 
loadings are quite similar to those previously 
identified with this factor (e.g., picking out 
certain forms presented with a number of ir- 
relevant forms, tracing maze lines, etc.). This 
factor does not appear in any of the present 
apparatus tests. This factor is practically 
identical to one identified in Roff’s analysis 
(12). 

Factor VIII is identified as Manual Dex- 
terity. 


5 Rotary Pursuit, another “following pursuit” task, 
has a loading of .24 on this factor and differs from 
the other tasks in that the course and rate of its 
target is more predictable. 


Factorial Analysis of Psychomotor Performance and Related Skills 


Variable 
No. Loading 
16 Santa Ana Finger Dexterity -38 
4 Rotary Pursuit .35 
14 Discrimination Reaction Time 34 
10 Compensatory Balance 31 


This factor appears to be the same one identi- 
fied in a few of the previous Air Force analy- 
ses as Psychomotor Precision and tentatively 
defined as the ability to make precise ma- 
nipulations under speed conditions. However, 
this definition appears much too broad. Other 
definitions have identified this factor with 
finger dexterity. The present interpretation 
of this factor as Manual Dexterity appears 
much less strained, especially in view of the 
significant loadings of Rotary Pursuit and 
Compensatory Balance, which do not em- 
phasize skillful finger movements but do in- 
volve skillful arm-hand movements. Simi- 
larly, the Discrimination Reaction Time Test 
may be viewed as involving skillful, well-di- 
rected arm-hand movements. Moreover, in 
our previous studies involving dexterity tests 
(1, 2) the Santa Ana Test was found to con- 
tain a high loading on a factor emphasizing 
more gross arm and hand movements (Manual 
Dexterity) and a low loading on a factor 
identified as emphasizing finger coordination 
(Finger Dexterity). Manual Dexterity is also 
distinguished from Psychomotor Coordination 
—TII in that the latter involves more coordi- 
nation between muscle groups, is not entirely 
restricted to arm-hand movements, and does 
not appear as concerned with speed. That 
this factor does not contribute to the pilot 
criterion has confirmed earlier findings (3). 
The factor has been found important in the 
bombardier criterion, however. 
Factor IX is identified as Visualization. 


Variable 
No. Loading 
8 Direction Control At 
11 Controls Orientation 36 
22 Directional Control (printed) 34 
20 Complex Movements (printed) 32 


It has been defined as the ability to make 
mental manipulations of visual images. Tn es- 
tablished tests of the Visualization factor, it 
is usually necessary to move, turn, twist, or 
rotate the stimulus situation in imagination 
and to recognize a new appearance, position, 
or condition after prescribed manipulations. 


103 


A separation between visualization and spa- 
tial-relations factors occurred first in certain 
Air Force analyses (3). The two apparatus 
tests on this factor also have strong loadings 
on Spatial Relations I. Thus, interpretation 
of the spatial components of the complex 
stimulus patterns presented in relation to the 
responses required have been hypothesized to 
account in some measure for performance on 
these tests. It seems reasonable from the na- 
ture of these tasks, however, that the ability 
to manipulate or rotate the entire stimulus 
pattern mentally into a new position also is 
necessary for effective performance on these 
particular, more difficult tasks.° It should be 
pointed out that none of the better estab- 
lished tests of Visualization have been in- 
cluded in the present analysis. However, each 
of the four tests loaded on the present factor 
exhibit higher correlations with such estab- 
lished tests of visualization (e.g, Pattern 
Comprehension, Spatial Visualization) than 
do tests not loaded on this factor (see 9). 

Factor X is a residual factor containing 
only insignificant loadings. 


Conclusions 


1. Of the nine factors identified, four were 
confined to the apparatus psychomotor tests. 

a. Two psychomotor factors were identi- 
fied among the tests which had previously 
identified a single factor called Psychomotor 
Coordination in many previous Air Force 
analyses. One of these factors appears to 
emphasize fine, highly controlled sensitive ad- 
justments in movements quite restricted in 
scope. 

b. The second factor seems to involve more 
gross coordination in making more extensive 
movements and was best measured by tests 
requiring the use of several body members 
simultaneously. 

c. The status of the factor previously called 
Psychomotor Precision in the Air Force re- 
search was clarified somewhat. This factor 
appears more adequately described as Manual 
Dexterity. 

d. Another psychomotor factor, called Rate 
Control, was restricted to certain apparatus 
tests which required the ability to make con- 

6 This finding supports the recent contention by 


Zimmerman (15) that Spatial Relations and Visuali- 
zation may be on some kind of difficulty continuum. 


104 


tinuous motor adjustments relative to changes 
in speed of direction of a continuously mov- 
ing object. This factor was found to cut 
across the traditional classification of “com- 
pensatory” versus “following” types of pur- 
suit tasks. 

2. Four factors were found measurable by 
both apparatus tests and certain printed tests. 

a. An Integration factor was found in tests 
which required the ability to utilize and ap- 
ply a number of disparate cues and activities 
quickly into an integrated resultant response. 
Its relation to integration factors found in 
previous research needs to be determined. 

b. Two separate Spatial Relations factors 
were isolated which suggest leads for resolv- 
ing certain questions in this area. One of 
these factors emphasizes the ability to inter- 
pret spatial characteristics of the stimulus 
situation, while the other is more identified 
with directional discrimination and orienta- 
tion of movement patterns. 

c. The separation of a Visualization factor 
from Spatial Relations has been confirmed. 
Of interest in this connection is the presence 
of Visualization in certain of the perceptually 
more difficult apparatus tests. 

3. A Perceptual Speed factor was found 
confined to printed tests. 

4. Over 50 per cent of the variance in the 
criterion of pilot proficiency was accounted 
for by the factors identified. Indications are 
that better tests are especially needed for the 
most valid factor, Psychomotor Coordination 
—I. The other two factors contributing sig- 
nificantly were Psychomotor Coordination— 
II and Spatial Relations I. 

5. Contrary to previous belief that motor 
skills are narrow in scope and highly specific 
to the task, the present results confirm that 
there are certain broad group factors of psy- 
chomotor skill which may account for per- 
formance on a wide variety of different psy- 
chomotor tasks. ; 

6. In general, some of the factors defined 
in psychomotor tests may also be sampled by 
printed tests. Although the distinction be- 
tween “motor” and “nonmotor” factors may 
be considered somewhat arbitrary, there is 
evidence that psychomotor tests sample some 
factors not covered by any other kinds of 
tests. 


Received June 15, 1955. 


Edwin A. Fleishman and Walter E. Hempel, Jr. 


References 


1. Fleishman, E. A. Dimensional analysis of psy- 
chomotor abilities. J. exp. Psychol., 1954, 48, 
437-454. (A factorial study of psychomotor 
abilities. USAF, Personnel & Training Res. 
Cent., Res. Bull., 1954, No. 54-15 [AFPTRC- 
TR-54-15].) 

2, Fleishman, E. A., & Hempel, W. E. Jr. A fac- 
tor analysis of dexterity tests. Personnel Psy- 
chol., 1954, 7, 15-32. 

3. Guilford, J. P. (Ed.) Printed classification 
tests. Washington: U. S. Government Print- 
ing Ofñce, 1947. (AAF Psychol. Program 
Res. Rep. No. 5.) 

4. Guilford, J. P., Fruchter, B. & Zimmerman, 
W. S. Factor analysis of the Army Air Forces 
Shepard Field Battery of experimental apti- 
tude tests. Psychometrika, 1952, 17, 45-68. 

5. Hempel, W. E., Jr., & Fleishman, E. A. A fac- 
tor analysis of physical proficiency and ma- 
nipulative skill. J. appl. Psychol., 1955, 39, 
12-16. (USAF, Personnel & Training Res. 
Cent., Res. Bull, 1954, No. 54-34 [AFPTRC- 
TR-54-34].) 

6. Melton, A. W. (Ed.) Apparatus tests, Wash- 
ington: U. S. Government Printing Office, 


1947. (AAF Aviat. Psychol. Program Res. 
Rep. No. 4.) 
7. Page, H. E. (Ed.) The pilot candidate school 


research program: historical background and 
organization. USAF Sch. Aviat. Med., Joint 
AF-Bu Med Res. Project Rep. No. 1, 1951. 

8. Payne, R. B. (Ed.) The pilot candidate selec- 
tion research program: implementation pro- 
cedures. USAF Sch. Aviat. Med., Joint AF- 
Bu Med Project Rep. No. 2, 1950. 

9. Payne, R. B. (Ed.) The pilot candidate selec- 
tion research program, IV: Test validities and 
intercorrelation. USAF Sch. Aviat. Med, 
Joint AF-Bu Med Project Rep. No. 4, 1952. 

10. Roff, M. F. A factorial study of tests in the 
perceptual area. Psychometric Monogr., 1952, 
No. 8. 

11. Roff, M. F. Personnel and classification pro- 
cedures: spatial tests. USAF Sch. Aviat. Med. 
Project Rep., 1951. 

12. Roff, M. F. (Ed.) The pilot candidate selec- 
tion research program, V: A factorial study of 
the motor aptitudes area. USAF Sch. Aviat. 
Med., Joint AF-Bu Med Res. Project Rep. 
No. 5, 1953. 

13. Thurstone, L. L. Multiple-factor analysis. Chi- 
cago: Univer. of Chicago Press, 1947. 

14. Zimmerman, W. S. A simple graphical method 
for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-55. 

15. Zimmerman, W. S. A revised orthogonal rota- 
tional solution for Thurstone’s original pri- 
mary abilities test battery. Psychometrika, 
1953, 18, 77-93. 


The Journal of Applied Psycholog: 
Vol. 40, No. 2, 1956 z 


Spatial Factors in Check Reading of Dial Groups * 


Robert S. Lincoln? and Emanuel Averbach 


The Johns Hopkins University 


It would seem reasonable to expect that 
reading habits will be influential in determin- 
ing the pattern: of scanning that an observer 
might use in viewing visual displays. If dis- 
plays are scanned in a manner similar to that 
used in reading, there should be some rela- 
tionship between the accuracy with which ele- 
ments of the display are read and the spatial 
location of those elements. This idea is in 
line with the results of an experiment by 
White, Warrick, and Grether (2) who found 
that observers tended to fixate initially on the 
upper half of a panel of check dials while 
they were looking for misaligned pointers 
in the panel. Furthermore, their observers 
missed 22% more deviant pointers in the bot- 
tom half of the panel than in the top half. 
The present experiment was designed to test 
more completely the effects of spatial location 
on the accuracy with which check dials are 
read as well as the relation between this ac- 
curacy and reading habits. The design also 
made it possible to test the effects of differ- 
ent durations of exposure of the dial displays. 


Method 


Apparatus. Figure 1 is a schematic diagram of 
one of the panels used in this experiment. On the 
actual panels the dials were 1.6 in. in diameter, and 
were separated from each other by a space of .2 of 
an inch, For purposes of analysis the panels were 
divided into four quadrants with three positions in 
each quadrant. In Fig. 1 the dotted lines separate 
the four quadrants of the panel, and the letters 
within the dials indicate the three quadrant posi- 
tions, “C” indicating corner, “S” indicating side, and 
«M” indicating a middle position with respect to the 
entire panel. No distinction was made between the 
two side positions in each quadrant. They were 
treated as one position. The letters, words, and 
dotted lines did not appear on the test panels. The 
observer’s (O’s) task was to detect the pointers that 


1 This experiment was done under Contract N5- 
ori-166, Task Order 1, between the Office of Naval 
Research and The Johns Hopkins University. This 
is Report No. 166-I-197, Project Designation No. 
NR 145-089, under that contract. a 

2Now with the Systems Development Division, 
The RAND Corporation, Santa Monica, California. 


deviated from the null position. The Os indicated 
their detections by putting check marks in the ap- 
propriate spaces on answer sheets that resembled the 
dial panels. The pointers that were misaligned al- 
ways deviated 15 degrees in a clockwise direction 
from the null position. There was a 7-degree sepa- 
ration between the deviant pointers and the upper 
edge of the normal range indicated on each dial face. 

A total of 42 panels was used. Four deviant 
pointers appeared on each panel. Panels were di- 
vided into 14 sets of three panels each. In all sets 
a deviant pointer appeared once in each of the three 
positions within each of the quadrants, making a 
total of 12 deviant pointers in each set of three 
panels. Combinations of deviant pointers within a 
panel were chosen in a random manner. Since there 
were two side positions in each quadrant, deviant 
pointers were randomly assigned between them. An 
effort was made to ensure that on most of the in- 
dividual panels there were deviant pointers in each 
quadrant and in each of the three quadrant posi- 
tions. Figure 1, for example, shows deviant point- 
ers in: (a) top-left quadrant, corner position; (b) 


a 
E 
i 
H 
i 
i 
H 
H 
i 
H 
H 
i 
i 
i 
H 
! 
H 
H 


i 
1 
i 
i 
i 
1 
i 
1 
1 
i 
1 
1 
i 
' 
1 
1 
i 
1 


BOTTOM 


Fic. 1. Schematic diagram of a dial panel. The 
dotted lines separate the quadrants of the panel. 
The letters indicate positions within quadrants. The 
letter “C” indicates a corner position, “S” indicates 
a side position, and “M” indicates a middle position. 


3 After the experiment had been run, an error was 
discovered in the construction of one panel in Set 11. 
In this set, one middle position was omitted, and a 
side position was represented in its place. The effect 
of this error was negligible. 


105 


106 


top-right quadrant, corner position; (c) bottom-left 
quadrant, middle position; and (d) bottom-right 
quadrant, side position. 

The panels were displayed in an electronic tachis- 
toscope that has been previously described (1). The 
panels were viewed at a distance of 23 in. from the 
O’s eyes. The brightness of the panels was approxi- 
mately 2.7 millilamberts. Light adaptation was 
maintained between exposures with a special adapt- 
ing light. 

Procedures. Three different groups of 14 under- 
graduate students each were used in the experiment. 
For one group the panels were exposed for a period 
of .35 sec. The exposure for the second group was 
.70 sec., and for the third group was 1.40 sec. The 
14 sets of panels were presented to the Os in orders 
determined by a 14 X 14 latin square. One square 
was used for all three groups. The same 14 sets of 
panels were presented twice to each O. A 5-min. 
rest break was given after the first presentation of 
the panels. 

Before the test panels were presented, all Os re- 
ceived practice in identifying various symbols placed 
in different positions on the practice panels. They 
were also shown drawings that indicated the null 
position and the appearance of a deviating pointer. 
The Os were not told the number of deviant point- 
ers that would appear on the panels. They were in- 
structed not to guess, but were encouraged to try to 
indicate the positions of all of the deviating point- 
ers that they actually detected. The data from two 
Os were discarded because they very obviously failed 
to follow the instructions concerning guessing, 


Results 


Most of the analysis of results concerns the 
percentages of deviant pointers that were de- 
tected under the various experimental condi- 
tions. It sometimes happened, however, that 


100 


DURATION OF EXPOSURE 
—— 1.40 SEC. 

.70 SEC. 

35 SEC. 


80 -o 


60 


40 


20 


PERCENT DEVIATIONS DETECTED 


4 8 12 16 20 24 28 
SETS 


Fic. 2. Practice curves for different durations 


of exposure. 


Robert S. Lincoln and Emanuel Averbach 


Os would indicate a deviant pointer in a lo- 
cation where the pointer did not deviate. 
These errors of localization will also be de- 
scribed. 

All statements of statistical significance are 
based upon nonparametric tests. When a 
comparison involved two different groups of 
Os, a test for unpaired replicates was used. 
When a comparison involved the same group 
of Os under a number of conditions, Fried- 
man’s chi-square test was used. Both of 
these tests have been described by Wilcoxon 
(3). 

Effects of practice and duration of exposure. 
The curves shown in Fig. 2 reflect both the 
effects of practice and duration of exposure. 
The breaks in the curves represent the rest 
pause that separated the two presentations of 
the same 14 sets of panels. These Os im- 
proved their accuracy considerably, but most 
of the improvement took place in the first 
half of the experimental period. 

The effects of exposure duration are very 
consistent and highly significant (p < .01). 
Since the durations are themselves geometri- 
cally related, the relation between percentage 
of detections and exposure duration is non- 
linear during the first half of the practice pe- 
riod. This relationship appears to become 
more nearly linear during the second half of 
the practice period. ` 

Despite the obvious increase in the per- 
centage of detections that occurred with 
longer exposures, Os made significantly more 
errors of localization as exposure time in- 
creased (p < .01). The longer the exposure, 
the more frequently the Os indicated deviant 
pointers in dials where deviations did not 
exist. This effect may have occurred because 
the Os saw more deviant pointers at the 
longer exposures, but were unable to remem- 
ber exactly where they had seen them. With 
the shorter exposures they saw fewer deviant 
pointers and, therefore, may have had less 
trouble remembering where they had seen 
them. At the longest exposure 9.7% of the 
reports of deviant pointers involved an error 
of localization. Since the Os were instructed 
not to guess, it is likely that the curves in 
Fig. 2 show the percentages detected and re- 
membered. The curves apparently underesti- 


Spatial Factors in Check Reading of Dial Groups 107 


QUADRANTS 

E Botton -RIGHT 

80} BOTTOM- LEFT 
E Tor~ RIGHT 


60) TOP-LEFT 


40) 


20) 


PERCENT DEVIATIONS DETECTED 


VILL LLL LL LLL LLL 
VZZZZZLLLLL LLL ELLE 
SAAN 

VZZZ LLL LLL 


eea een 1.40 —— 


DURATION OF EXPOSURE 
( SECONDS) 


Fic, 3. Percentage of detections in various quadrants 
for different exposures, 


mate the number of deviant pointers actu- 
ally detected. 

Quadrant effects. If Os did scan the dial 
panels as though they were reading them, the 
smallest number of detections would be ex- 
pected in the bottom-right quadrant. Pro- 
gressively more detections would be expected 
in the bottom-left and top-right quadrants. 
The greatest number of detections would be 
expected in the top-left quadrant. 

Figure 3 shows percentage of detections by 
quadrants. For each duration of exposure 


- the percentage of detections increases in the 


expected manner. The differences among 
these quadrants are significant (p < 01). 
Figure 3 also indicates that the effects of ex- 
posure are very consistent for each quadrant 
considered separately. 

Effects of position. Within each quadrant 
separate dials were located either on a side, 
in a corner, or in a middle position on the 
panel. The experimental panels were de- 
signed so that deviant pointers appeared in 
either one or the other side positions as often 
as in the corner or middle positions. Because 
of this arrangement it is possible to compare 
the number of detections made in each of 
these three positions within and between 
quadrants. Of particular interest are the 
practice curves obtained for each position 
considered separately. As Fig. 2 indicated, 
performance improved considerably in the 
first half of the practice period. In that 


PERCENT DEVIATIONS DETECTED 


figure, however, the different positions were 
pooled. It would have been possible to ob- 


„tain the curves of Fig. 2 if improvement had 


been shown in detecting deviant pointers in 
only one position while performance remained 
stable for the other positions. Actually im- 
provement was observed for each of the three 
positions. This result is pictured in Fig. 4 in 
which exposures and quadrants are pooled, 
The differences among positions are signifi- 
cant (p < .01). The data from only the first 
half of the experiment are pictured in Fig. 4 
because most of the practice effect appeared 
in the early sets. The relative positions of 
the curves remained the same in the second 
half of the experiment. 

No fixation point was provided on the dial 
panels used in this experiment. As Fig. 4 
shows, however, the greatest percentage of 
detections was made in the middle positions 
on the panels from the very beginning of the 
experiment. This result suggests that these 
Os immediately established their own initial 
fixation point in the center of each panel. 
Subsequent fixations on a panel probably fol- 


100 
POSITIONS. 


e——e MIDDLES 
e----e CORNERS 
80F e—e SIDES 


60 


40 


20 


4 8 12 
SETS 


Fic. 4. Practice curves for different dial positions. 


POSITIONS 

E soes 
4 CORNERS 

& moves 


20 


PERCENT DEVIATIONS DETECTED 


SSSSSSSSS SNS 


KSSS 
KSSS 


BOTTOM-RIGHT BOTTOM-LEFT TOP-RIGHT TOP-LEFT 
QUADRANTS 


Fic, 5. Percentage of detections for different posi- 
tions within quadrants. 


lowed according to established reading habits, 
as the bars in Fig. 3 indicated. 

In Fig, 4 the aberrant points in Set 11 for 
the middle and side positions seem to have 
resulted from the error in panel make-up that 
was previously described. One panel con- 
tained an extra side position and was short 
one middle position; Os detected about the 
same absolute number of deviant pointers as 
they detected in Sets 10 and 12. Apparently 
these Os were detecting as many of the devi- 
ant pointers as they could, and a few more or 
less in one position did not make much differ- 
ence. Therefore, when the number of detec- 
tions is plotted in Fig. 4 as a percentage of 
possible detections, the percentage of actual 
detections becomes abnormally high for the 
middle positions and low for the side posi- 
tions. : 

Since the three positions were equally rep- 
resented in each of the quadrants, it is pos- 
sible to compare the frequency of detection 
for each position within each quadrant. The 
results of this comparison «are pictured in 
Fig. 5 in which exposures are pooled. It is 
apparent that the positional effects are highly 
consistent within the various quadrants, In 


Robert S. Lincoln and Emanuel Averbach 


addition the percentage detected for each 
position changes quite consistently between 
quadrants. Deviant pointers were detected 
about 86% of the time when located in the 
middle position of the top-left quadrant with . 
an exposure of 1.40 sec. They were detected 
about 5% of the time when located in a side 
position of the bottom-right quadrant with an 
exposure of .35 sec. 


Summary 


Observers were required to detect deviant 
pointers within a display panel of 16 circular 
dials. For each dial the null point was lo- 
cated in the 9 o'clock position. Throughout 
the experiment the spatial locations of the 
deviant pointers within a panel were con- 
trolled in such a way that it was possible to 
determine the percentage of deviations de- — 
tected as a function of quadrant location and 
position within quadrants. The consistency 
of these spatial effects was determined over 
three durations of panel exposure. 

The results showed that spatial location was 
an important determinant of the number of 
detections that were made. The pattern of 
detections that appeared seems to confirm the ~ 
idea that the scanning habits which observers 
use are highly related to previously learned 
reading habits. 


Received May 31, 1955. 


References 


1, Merryman, J. G., & Allen, H. E. An improved 
electronic tachistoscope. Amer. J. Psychol. 
1953, 66, 110-114. | 

2. White, W. J., Warrick, M. J., & Grether, W. F. 
Instrument reading. III: Check reading of in- 
strument groups. J. appl. Psychol, 1953, 37, 
302-307. 

3. Wilcoxon, F. Some rapid approximate statistical 
procedures. New York: American Cyanamid 
Co., 1949, 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


The Contribution of Lecture Supplements to the Effectiveness 
of an Attitudinal Film * 


Frank T. Staudohar and Robert G. Smith, Jr. 


Air Force Personnel and Training Re. 


This study had its beginning in the ob- 
servation of an operational problem in mili- 
tary training. Military training for airmen 
and officers is highly concerned with changing 
attitudes and motivations. Hollywood films 
are frequently available which bear in a gen- 
eral way upon desired changes in attitudes. 
Yet these films, despite the interest they 
arouse, are seldom specifically aimed at a 
given training objective. How can these films 
be used more effectively to accomplish spe- 
cific objectives which involve attitude change? 

It has been established in many studies 
that attitudes can be changed by appropriate 
motion picture films (2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12). The changes effected by the films 
frequently have been shown to last over a 
period of months. At the same time, it has 
been demonstrated (4, 13) that the effect of 
the film is likely to be specific, rather than 
general. 

This report describes a study which evalu- 
ated one method of making attitudinal films 
more effective in the accomplishment of spe- 
cific training objectives. The method involved 
the use of supplementary lectures in addition 
to the film. The purpose of the lectures was 
to stress to the subjects the significant scenes 
and events in the movie which were related to 
a specific training objective. Similar lectures 
have been found to increase the effectiveness 
of films in teaching facts (13). 


Method 


Air Force basic military trainees were 


Subjects. 
They were in the second 


used as subjects (Ss). 


1 This is a summary of a thesis submitted by the 
senior author in partial fulfillment of the require- 
ments for the degree of Master of Science at Trinity 
University, San Antonio, Texas. The investigation 
was carried out under the Air Force Personnel and 
Training Research Center in support of Project No. 
7708. Permission is granted for reproduction, trans- 
lation, publication, and use or disposal in whole or 
in part by or for the United States Government. 


search Center, Lackland Air Force Base 


week of a basic training period of approximately 10 
weeks. 

Because of limitations of space and military con- 
trol over the Ss, the sampling unit used in the ex- 
perimental part of the study was the flight, not the 
individual. A total of 16 basic training flights were 
used. The number of men per flight averaged 55. 
There was a total of 876 Ss used in the experimental 
phase of the study, and 311 in a preliminary scale- 
developing phase. 

The film. The commercial motion picture Twelve 
O'Clock High was chosen for this study because it 
was generally relevant to the development of a fa- 
vorable attitude toward discipline. 

Its story is that of an American Air Force bomber 
group in England during the early part of World 
War II. This group was demoralized and suffering 
heavy losses, Its commander had become too closely 
identified with his men and could not effectively 
maintain discipline. He could not bring himself to 
transfer or punish those who committed serious 
errors. He is replaced by a general who institutes 
strict discipline and brings the group up to a highly 
effective state. 

This film was used in the training program of the 
Officer Military Schools at Lackland Air Force Base. 

The lectures. The film was reviewed to deter- 
mine which scenes were especially relevant to the 
development of a favorable attitude toward disci- 
pline. Lectures designed to point out those scenes 
particularly relevant to the importance of discipline 
were developed. 

A prelecture pointed out to the Ss what to look 
for in the film, It told of certain scenes to which 
they should direct their attention and gave a brief 
resumé of the effects of the variation in discipline as 
portrayed in the motion picture. A postlecture cov- 
ered exactly the same topics but informed the Ss 
what they should have noted in the film, Both lec- 
tures required about eight minutes. 

In addition, two brief lectures were written which 
included the principal topics of the other two Jec- 
tures. One of the short lectures was used as an in- 
troduction, and the other as a conclusion. These 
were each about four and a half minutes long and 
were always used in conjunction with each other. 

The lectures were delivered by an Air Force Offi- 
cer who wore campaign ribbons and pilot’s wings. 
It was felt that the ribbons and wings would in- 
crease the prestige of the lecturer as someone who 
evidently could speak from experience and with au- 
thority. The lectures were delivered from notes 


109 


110 


somewhat informally, although the lecturer confined 
` himself within narrow limits. 7 

The word level of the lectures was evaluated as 
Fairly Easy by means of the Flesch Reading Ease 
Score. This is estimated as the level which could 
be read with ease by about 88% of the U. S. popu- 
lation. 

The questionnaire. Twenty-three items dealing 
with discipline were assembled in a tentative atti- 
tude scale. This questionnaire was given to 311 Air 
Force basic trainees. After considering the propor- 
tion answering each alternative and the meaning of 
the item, responses to the items were dichotomized 
close to the 50% point. Item analyses were per- 
formed according to the method of Davis (1). A 
total of 16 items remained after item analysis (1, Ap- 
pendix B). An estimate of the internal consistency 
of the items was obtained by means of Kuder- 
Richardson Formula 20, The value of the coeffi- 
cient was .46. Although it is recognized that the 
Kuder-Richardson procedure underestimates the re- 
liability of the questionnaire, the number of Ss was 
increased to compensate for the low internal con- 
sistency. 

Procedure. There were four groups with four 
flights in each group. The assignment of flights to 
groups was done with the aid of a table of random 
numbers, 

Each of three of the groups heard one of the lec- 
tures described above, saw the film, and took the 
questionnaire. The other group was a control group 
which saw the film and took the questionnaire. It 
did not hear any of the lectures. 

The questionnaire was administered at the con- 
clusion of the film and/or the lecture. There was 
no pretest to determine initial attitude toward disci- 
pline. It was felt that a pretest might alert Ss to 
those aspects of the film dealing with discipline, If 
this were true, the effect of the pretest might be 
sufficiently strong to outweigh the effects of the 
lectures, 

Accordingly, the principle of randomization was 
resorted to in order to control and equate relevant 
variables, 


Table 1 


Significance of Differences in Attitude 
Toward Military Discipline 


aaa 


Mean 

Source of Variation df Square R 
Within flights 860 10.15 
Between flights 15 28.53 
Between treatments 3 8112 7.93* 
Between flights within 

treatments 12 1547 1.52 
Pooled error 872 10.23 


* Significant beyond the .001 level. 


Frank T. Staudohar and Robert G. Smith, Jr. 


Results 


The first step in the analysis of the results 
was analysis of variance. Since the sampling 
unit was the flight rather than the individual, 
it was necessary to determine whether signifi- 
cant variation existed between the flights 
within the treatment groups when compared 
with the variation between individuals. This 
comparison did not yield a statistically sig- 
nificant F ratio. Accordingly, these two 
sources of variation were pooled to provide 
an error term for the comparison of differ- 
ences between treatments. This comparison 
was significant at beyond the .001 level. The 
results of this analysis are shown in Table 1, 

The results of the analysis indicated the 
existence of significant variation in attitude 
toward discipline, but did not point out which 
groups differed one from the other. Accord- 
ingly, ¢ tests were made to compare groups. 
These results are shown in Table 2. 

Every one of the lecture groups resulted in 
an increase in favorableness of attitude to- 
ward discipline as compared to the control 
group which did not receive a lecture. On 
the other hand, there were no significant dif- 
ferences between the lectures in their effect 
on attitude toward discipline. 


Discussion 


The results of this study clearly indicate 
that it is possible to improve the attitude- 
changing character of a motion picture by 
supplementing it with appropriate lecture 
presentations. 


Table 2 


Differences Between Mean Attitude Scores of 
Experimental and Control Groups 


Mean 

Groups* Difference SD t P. 

HI .07 439 313 224 

HI -03 438 311 096 

HV 95 437 .312 3.045 .01 
II 04 439 303 132 

IIV 88 438 303 2.904 01 
IV 92 437 -302 3.046 01 

+6 ieee s d 
MIL. Postlectures IV. Nonleetu ot ue; II. Prelecture; 


Contribution of Lecture Supplements 111 


There are, however, certain limitations that 
stem from specific practices and materials 
used in this study. It should not be con- 
cluded that a film having only a remote re- 
lation to a training objective can be made 
useful by lecture supplement. Twelve O’Clock 
High contained several sequences bearing very 
closely on the need for discipline in military 
forces. 

It is also possible that variations in the 
way the lectures are presented might cause 
differences in their effectiveness. It may be 
that, to basic trainees, the officer who de- 
livered the lectures had sufficient prestige 
that the location of the lectures in relation to 
the film was not of sufficient strength to over- 
come his influence. 

It should also be mentioned that the Ss had 
only limited experience with the Air Force. 
It is quite possible that additional experience 
with military discipline may further change 
their attitudes toward discipline. 


Summary and Conclusions 


This report describes an experimental evalu- 
ation of the effect on expressed attitudes to- 
ward discipline of lecture supplements to a 
film. The lectures were designed to point out 
significant sequences in the film which were 
thought to stress the need for discipline in 
the military service. 

Three lectures were developed. They were 
comparable in content. One was used prior 
to the film; another was used after the film; 
a third was designed to be presented part be- 
fore and part after the film. 

Those Ss hearing lectures with the film 
were generally more favorable in attitude to- 
ward military discipline than those who saw 
the film alone. There were, however, no sta- 
tistically significant differences between the 


positions of lectures in their effect on atti- 
tude toward discipline. 


Received May 18, 1955. 


References 


1. Davis, F. B. Jtem-analysis data. 
Harvard Univer. Press, 1949. 

2, Hovland, C. I, Lumsdaine, A. A, & Sheffield, 
F. D. Studies in social psychology in World 
War II: Experiments on mass communication. 
Princeton: Princeton Univer. Press, 1949. 

3. Hulett, J. E., Jr. Estimating the net effect of a 
commercial motion picture upon the trend of 
local public opinion. Amer. sociol. Rev., 1949, 
14, 263-275. 

4. Krech, D., & Crutchfield, R. S. Theories and 
problems of social psychology. New York: 
McGraw-Hill, 1948. 

5. McFarlane, A. M. A study of the influence of 
the educational geographical film upon the 
racial attitudes of a group of elementary 
school children. British J. educ. Psychol, 
1945, 15, 152-153. 

6. Murphy, G., Murphy, L. B., & Newcomb, T. M. 
Experimental social psychology. New York: 
Harper, 1937. 

7. Newcomb, T.M. Social psychology. New York: 
Dryden, 1950. 

8. Peterson, R. C., & Thurstone, L. L. The effect 
of a motion picture film on children’s attitude 
toward Germans. J. educ. Psychol., 1932, 23, 
241-246. 

9. Peterson, R. C., & Thurstone, L. L. Motion 
pictures and the social attitudes of children, 
New York: Macmillan, 1933. 

10. Raths, L. E., & Trager, F. M, Public opinion 
and “Crossfire.” J. educ. Sociol, 1948, 21, 
345-368. 

11, Rosen, I. C. The effect of the motion picture 
“Gentleman’s Agreement” on attitude toward 
Jews. J. Psychol., 1948, 26, 525-536. 

12, Wiese, M. J., & Cole, S. G. A study of children’s 
attitudes and the influence of a commercial 
motion picture. J. Psychol, 1946, 21, 151- 
171. y 

13. The Pennsylvania State College. Practical prin- 
ciples governing the production and utilization 
of sound motion pictures. Instructional Film 
Research Program, 1950. 


Cambridge: 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


Leadership and Predictive Abstracting 


C. G. Browne and Richard P. Shore 
Wayne University 


Recently the emphasis of research on lead- 
ership by both psychologists and sociologists 
has been on methods which look upon leader- 
ship as a dynamic relationship existing be- 
tween the “leaders” and the “followers,” or 
between the leader and the group. This is in 
somewhat sharp contrast with the earlier em- 
phasis in which leaders were studied in isola- 
tion from the group, largely in terms of per- 
sonal characteristics that were measurable 
within the leader himself (11). 

In attempting to analyze this dynamic re- 
lationship between the leader and the group 
and to arrive at the variables which may af- 
fect the relationship, it has appeared to some 
that there is at least face validity in the hy- 
pothesis that ‘the individual who will be most 
successful in influencing the actions of a 
group, and therefore ordinarily classified as 
the leader, will be the individual who knows 
the thinking of the group best. With this 
knowledge the individual can guide his own 
behavior in attempting to influence the be- 
havior of the group toward some specific ac- 
tion or goal. A number of studies of this 
“knowledge of others” or “prediction” vari- 
able have been published (1, 2, 3, 4, 5, 6, 8, 
9, 10). 

Some investigators have referred to the pre- 
diction variable as “empathy.” To the pres- 
ent authors, however, this is not a satisfac- 
tory designation for the phenomenon being 
studied. The term empathy is borrowed from 
clinical psychology and carries a connotation 
which relates to one individual putting him- 
self in another individual’s place and in effect 
experiencing the other person’s thoughts and 
emotions. In short, “to empathize” means 
something different, perhaps something more, 
than to predict or to attempt to determine, 
and just determine, another person’s thinking. 
In addition, empathizing usually is the result 
of one individual being told by another indi- 
vidual what his thoughts and experiences have 


112 


been, or at least learning of them through 
some particular, identifiable channels. 

The essence of the leadership variable previ- 
ously mentioned does not include either (a) — 
the taking on of the experience and the emo- 
tion of another individual, or (b) the neces- 
sity of learning specific knowledge regarding 
another person’s experiences through any spe- 
cific channels. Rather the leadership vari- 
able relates to an individual’s ability to take 
from all of his knowledge of individuals or 
groups those particulars which will prove to 
be determiners of the behavior or attitude, 
etc. of an individual or a group on any given 
question at any given time. Fundamentally 
the process involved is the process of ab- 
stracting, which is the process of selecting 
certain details from an event and eliminating 
other details which are included in the same 
event (7). It follows, then, that the extent 
to which any individual is able to predict the 
attitudes, etc., of any other individual will be 
dependent largely on the extent to which he 
is able to select details from an event which 
are pertinent to and have an influence on the 
areas to be predicted. The leadership vari- 
able being investigated in this study, there- 
fore, will be referred to as “predictive ab- 
stracting” since it is believed that this term ` 
is a more satisfactory operational description 
of the process involved than are the other 
terminologies which have been used by other 
investigators. 

Specifically, the present study is concerned 
with the following hypotheses: (a) predictive 
abstracting is a function of leadership; (b) 
a direct relationship exists between an indi- 
vidual’s predictive abstracting ability and his 
echelon level in an industrial organization. 


Subjects and Procedure 


Four echelon levels of a Detroit metal tubing 
manufacturing company (Wolverine Tube, Division 
of Calumet and Hecla, Inc.) were sampled to pro- 
vide a total of 83 subjects (Ss), divided as follows: 
5 department managers (most focal echelon used), 


Leadership and Predictive Abstracting 113 


9 general foremen selected from various plant de- 
partments, 17 assistant foremen from various de- 
partments, and 52 nonsupervisory personnel selected 
at random from an employee roster. The nonsuper- 
visory personnel were sampled proportionately from 
mill and engineering departments and from each of 
the three working shifts. 

A group-administered questionnaire was devised 
which consisted of 27 statements, divided into nine 
statements each for attitudes toward job satisfaction, 
economic issues, and social issues. The statements 
were classified into the three areas by agreement of 
five persons experienced in attitude surveying. Re- 
sponses were made on a four-point intensity scale 
ranging from “strongly agree” to “strongly disagree.” 
The relative order of the response columns was re- 
versed at random for approximately half of the state- 
ments. On the questionnaire the statements were ar- 
ranged in random order. Responses were scored as 
follows: strongly agree, four points; agree, three 
points; disagree, two points; and strongly disagree, 
one point. All data were analyzed on the basis of 
these values. 

The questionnaires were administered as follows: 

1. Each S completed the questionnaire on the basis 


‘of his own attitudes toward the questions included. 


Although the present interest was in the predictive 
abstracting ability and not in the attitudes as such, 
it was necessary to obtain a scoring of individual 
attitudes for comparative purposes with the predic- 
tive abstracting responses. 

2. Each department manager completed a ques- 
tionnaire on the basis of his abstracting of the re- 
sponses of nonsupervisory personnel as a group. 

3. Each general foreman and each assistant fore- 
man completed two questionnaires involving predic- 
tive abstracting—one for his abstracting of nonsu- 
pervisory personnel responses and the other for his 
abstracting of department manager responses. 

4, Each nonsupervisory person completed a ques- 
tionnaire on the basis of his abstracting of the re- 
sponses of department managers as a group. 

Thus, there were two major types of predictions— 
those on nonsupervisory personnel made by the 
three supervisory echelons and those on department 
managers made by general foremen, assistant fore- 
men, and nonsupervisory personnel. 

To determine the accuracy with which one echelon 
abstracted the attitudes of another echelon, a predic- 
tive abstracting score (PRAB score) was computed 
for each statement. The PRAB score is the differ- 
ence, without regard to algebraic sign, between the 
value of a predicted response and the actual mean 
response value of the group on whom the prediction 
was made. For example, if an individual predicted 
that the mean response of a group on a given state- 
ment would be “strongly agree” or a score of 4, and 
the actual mean response was 3.1, the PRAB score 
would be .9. To determine group predictive ab- 
stracting accuracy on an item, the mean PRAB 
scores of all Ss in the predicting group were used. 
The higher PRAB scores indicate a less accurate pre- 


dictive abstracting ability since a PRAB score of 
zero would represent perfect prediction. 

A nonparametric test of significance, the ranking 
method (13), was used to determine whether two 
echelons differed significantly in their ability to ab- 
stract the responses of another echelon, For ex- 
ample, in comparing the relative ability of depart- 
ment managers and general foremen to abstract the 
job satisfaction (nine items) of nonsupervisory per- 
sonnel, ranks ranging from 1 to 18 were assigned to 
the PRAB scores of each of the predicting echelons, 
Then the nine ranks for each of the predicting eche- 
lons were summed, The smallest sum of ranks was 
entered in a table of probability values (12) and the 
significance of the difference between the two sets of 
summed ranks was determined. When a significant 
difference was obtained, the smaller sum of ranks 
indicated the echelon which abstracted attitudes with 
a significantly greater degree of accuracy than the 
other echelon. 


Results 


It is hypothesized that predictive abstract- 
ing is a function of leadership and that a 
direct relationship exists between industrial 
echelon level and PRAB (predictive abstract- 
ing) scores. In Table 1 the sums of ranks 
(based on PRAB scores) of the three pre- 
dicting echelons on both department managers 
and nonsupervisory personnel are presented, 
each echelon being compared with the other 
two echelons in predictive abstracting ability. 
In all of the data the smaller sum of ranks 
indicates the echelon which predicted more 
accurately, 

The data in Table 1 indicate that generally 
throughout all of the predictive abstracting 
(a) the supervisory personnel predicted more 
accurately than the nonsupervisory, and (b) 
the more focal supervisory personnel pre- 
dicted more accurately than the less focal. 
Although in most cases the differences are not 
great enough to be statistically significant at 
the 5% level, the trend is definite enough 
throughout to justify support of the hy- 
pothesis. 

In the predictions on the department man- 
agers, both the general foremen and the as- 
sistant foremen predicted on the total ques- 
tionnaire with a significantly greater degree 
of accuracy (1% level) than the nonsuper- 
visory personnel. Although the superiority of 
predicting was not statistically significant for 
each of the three divisions of the question- 
naire, it was significant at the 1% level for 


114 


C. G. Browne and Richard P. Shore 


Table 1 
Industrial Echelon Differences in Predictive Abstracting of Attitudes Based on PRAB Scores 


Sum of Ranks 


Predictive abstracting on Total | Job. Economic Social 
department managers Questionnaire Satisfaction Issues Issues 
General foremen 580*** 65 64 66 
Nonsupervisory 905 106 107 105 
Assistant foremen 567*** 75 a2 73 
Nonsupervisory 918 96 118 98 
General foremen 714 94 71 85 
Assistant foremen 770 76 100 86 
Predictive abstracting on 
nonsupervisory personnel 
Department managers 659 90 sg 80 
General foremen 825 80 112 90 
Department managers 671 83 68 83 
Assistant foremen 813 88 103 88 
General foremen 746 78 95 85 
Assistant foremen 739 93 76 85 


**t Difference between groups significant at 2% level of confidence. 


‘rt Difference between groups signifi 
Note.—To be significant, the following sums 


5% 2% 1% 
9 statements 63 59 56 
27 statements 629 608 593 


the assistant foremen’s predictions of eco- 
nomic issues, and approached significance at 
the 5% level in the predictions of the general 
foremen for all of the three divisions. The 
comparisons of the predicting of the general 
foremen and the assistant foremen yielded no 
significant differences, either for the total 
questionnaire or for any of the three divisions. 

In the predictions on the nonsupervisory 
personnel there was a general tendency for 
department managers to predict the attitudes 
of nonsupervisory personnel with greater ac- 
curacy than either the general foremen or the 
assistant foremen. Statistical significance was 
reached, however, only in the predictions of 
department managers on economic issues when 
compared with general foremen (2% level). 
When compared with each other, the general 
foremen and the assistant foremen were very 
close in their PRAB scores, and no trend for 
one group or the other to be superior is indi- 
cated. 

An hypothesis to the effect that any eche- 
lon would predict more effectively the atti- 


ficant at 1% level of confidence. 
of ranks are required: 


tudes of individuals in adjoining or near eche- 
lons in the organizational hierarchy would not 
be supported with the present data. In pre- 
dictions on the department managers, the as- 
sistant foremen predicted almost equally well 
as the general foremen, who are nearer the 
department managers in echelon level. In 
predictions on the nonsupervisory personnel, 
the department managers predicted more ac- 
curately than either the general foremen and 
the assistant foremen, even though both of 
the foremen groups are nearer the nonsuper- 
visory personnel in the organizational struc- 
ture and in their daily working contacts. The 
data would, however, give support to an 
hypothesis that predictive abstracting is an 
aspect of an individual’s behavior in his rela- 
tions with other people which is not depend- 
ent upon the extent of his acquaintance or 
contacts with the individuals whose behavior 
he is predicting. 

In Table 2 the data are presented for the 
comparison of the PRAB scores (in sums of 
ranks) of the most focal echelon, the depart- 


bn 


JEn 


F! 


Leadership and Predictive Abstracting 115 
Table 2 
Predictive Abstracting of Attitudes by Department Managers and Nonsupervisory Personnel 
Sum of Ranks 
Total ` Job Economic Social 
Questionnaire Satisfaction Issues Issues 
Department managers predicting 
on nonsupervisory 654 99 65 61* 
Nonsupervisory predicting 
on department managers 831 71 106 110 


* Difference between groups significant at 5% level of confidence. 


ment managers, with the least focal echelon, 
the nonsupervisory personnel. On the total 
questionnaire, on economic issues and on so- 
cial issues, the department managers pre- 
dicted more accurately, the difference being 
significant at the 5% level for social issues, 
and very nearly significant for the economic 
issues. The nonsupervisory personnel, how- 
ever, predicted the department managers more 
accurately on job satisfaction, although the 
difference is not near significance. These 
data related to the PRAB scores of the two 
extreme echelons used in the study again add 
evidence to the support of the hypothesis that 
predictive abstracting is related to echelon 
level, the more focal echelons having higher 
PRAB scores. From the standpoint of in- 
dustrial relations, these data may lead to a 
consideration of the possibility that industrial 
conflict arises more as the result of lack of 
understanding of management personnel by 
nonsupervisory personnel than it does from a 


lack of management’s understanding of non- 
supervisory thinking. 

The data in Table 3 present comparisons 
of general foremen and assistant foremen in 
their predictive abstracting of the attitudes 
of department managers and nonsupervisory 
personnel. On the total questionnaire, on 
economic issues, and on job satisfaction, both 
foremen groups predicted department man- 
agers better than they predicted nonsuper- 
visory personnel, and in all cases the differ- 
ences were significant at the 5% level or 
higher. On social issues there is no difference 
for either foremen group in its predictive ab- 
stracting of either the department managers 
or the nonsupervisory personnel. These data 
also reject any hypothesis that PRAB scores 
are likely to be higher when they are related 
to adjoining or near echelons or if working 
contacts are closer, since both of the foremen 
groups are nearer the nonsupervisory person- 
nel in the extent of their contacts with them 


Table 3 
Predictive Abstracting of Attitudes by General and Assistant Foremen 


Sum of Ranks 


Total Job Economic Social 
Questionnaire Satisfaction Issues Tssues 
Predictive abstracting by 
general foremen À K 
Department managers 612* 63' 56’ 83 
Nonsupervisory 873 108 114 87 
Predictive abstracting by 
assistant foremen 
Department managers S74"* 62* Ry ke 85 
911 109 118 86 


Nonsupervisory 


* Significant at 5% level of confidence. 
** Significant at 2% level of confidence. 
#** Significant at 1% level of confidence. 


116 


and in the organizational hierarchy. In spite 
of the echelon distance, however, the foremen 
predicted the attitudes of department man- 
agers consistently better. When the data in 
Table 3 are compared with the data in Table 
2 in terms of industrial relations they may 
support the beliefs of many business execu- 
tives that the great need for training in hu- 
man relations and general understanding of 
human problems is with the outer echelons of 
supervision, with those individuals who are 
closest to nonsupervisory personnel but who 
show lack of understanding in dealing with 
them, ` 
Summary 


Eighty-three employees of Wolverine Tube, 
a Detroit manufacturing firm, representing 
four echelons of the business (department 
managers, general foremen, assistant foremen, 
and nonsupervisory personnel) were adminis- 
tered an attitude questionnaire dealing with 
job satisfaction, economic issues, and social 
issues. Following this, each echelon pre- 
dicted the attitudes of the department man- 
agers as one group and the nonsupervisory 
personnel as a second group. The process in- 
volved in these predictions is designated as 
“predictive abstracting” and the difference 
between the individual’s predictions and the 
mean score of the group on whom he is pre- 
dicting is designated as a PRAB score. On 
the basis of the predictive abstracting of the 
various echelons, PRAB scores were obtained 
for the department managers’ predictions on 
the nonsupervisory personnel, the nonsuper- 
visory personnel on the department managers, 
and the general foremen and the assistant 
foremen on both the department managers 
and the nonsupervisory personnel, A non- 
parametric test of significance, the ranking 
method, was used in analyzing the data. 

In this study it is assumed that leadership 
is involved in the performance of supervisory 
and executive industrial functions. Through- 
out all of the predictive abstracting, the data 
generally support the hypothesis that predic- 
tive abstracting is an aspect of leadership, 
with the following specific observations: (a) 
the supervisory personnel predicted more ac- 
curately than the nonsupervisory, and (b) 
the more focal supervisory echelons predicted 
more accurately than the less focal. A study 


C. G. Browne and Richard P. Shore 


is needed which will test the extent to which 
predictive abstracting is a function of the in- 
dividual and/or a function of the position 
which the individual occupies. When these 
data are available, we shall be in a better po- 
sition to determine whether or not the PRAB 
score of an individual may be used effectively 
in predicting his probable performance as a 
leader or supervisor or executive, or whether 
the PRAB score will serve only as an indi- 
cator of areas of training which are needed if 
the individual is to perform successfully in a 
leadership capacity. 


Received April 4, 1955. 


References 


1. Bender, I. E., & Hastorf, A. H. The perception 
of persons; forecasting another person’s re- 
sponses on three personality scales. J. ab- 
norm. soc. Psychol, 1950, 45, 556-561. 

2. Browne, C. G. Study of executive leadership in 
business, III. Goal and achievement index. 
J. appl. Psychol., 1950, 34, 82-87. 

3. Browne, C. G., & Neitzel, Betty J. Communica- 
tion, supervision, and morale. J. appl. Psy- 
chol., 1952, 36, 86-91. 

4. Chowdry, K., & Newcomb, T. M. The relative 
abilities of leaders and non-leaders to estimate 
opinions of their own groups. J. abnorm. 
soc. Psychol., 1952, 47, 51-57. 

5. Dymond, Rosalind F. Personality and empathy. 
J. consult. Psychol., 1950, 14, 343-350. 

6. Kerr, W. The empathy test. Chicago: Psy- 
chometric Affiliates, 1947. 

7. Korzybski, A. Science and sanity. Lakeville, 
Conn.: Institute of General Semantics, 1948. 

8. Norman, R. D., & Ainsworth, Patricia. The re- 
lationships among projection, empathy, re- 
ality, and adjustment, operationally defined. 
J. consult. Psychol., 1954, 18, 53-58. 

9. Remmers, L. J, & Remmers, H. H. Studies in 
industrial empathy. I. Labor leaders’ atti- 
tudes toward industrial supervision and their 
estimate of managements’ attitudes. Person- 
nel Psychol., 1949, 2, 427-436. 

10. Remmers, L. J., & Remmers, H. H. Studies in 
industrial empathy. II. Managements’ atti- 
tudes toward industrial supervision and their 
estimate of labor attitudes. Personnel Psy- 
chol., 1950, 3, 33-40. 

11. Stogdill, R. M. Personal factors associated with 
leadership: a survey of the literature. J. Psy- 
chol., 1948, 25, 35-71. 

12. Wilcoxon, F. Probability tables for individual 
comparison by ranking methods. Biometrics, 
1947, 3, 119-122. 

13. Wilcoxon, F. Some rapid approximate statisti- 
cal procedures. Stamford: American Cyanimid 
Company, 1949. 


~ 


N 


The Journal of Applied Psycholo; 
Vol. 40, No. 2, 1956 ii i 


Validity of Extrapolating 


Nonresponse Bias from Mail 


Questionnaire Follow-ups * 


Herbert Zimmer °? 


Air Force Personnel and Training Research Center, Officer Education Research Laboratory, 
Maxwell Air Force Base, Alabama 


Sample surveys usually: seek to describe 
some aspect of a defined population. As such, 
their accuracy rests on the precision with 
which the sample represents the population. 
A considerable body of literature points to 
the possibility that nonrespondents may differ 
markedly from respondents in any survey 
sample, and that neither group may consti- 
tute an unbiased sample of the population by 
itself. The literature on nonresponse bias in 
mail questionnaires is far too extensive to be 
discussed here, but many of the relevant ar- 
ticles are summarized in two reviews (7, 9). 
The problem of nonresponse bias is not re- 
stricted to mail questionnaires alone, but oc- 
curs in any survey that attempts to represent 
a population fairly, regardless of the research 
tool employed (4). An awareness of this 
possible bias has led surveyors to use a num- 
ber of ingenious devices for increasing the 
initial response and for follow-up communi- 
cations, in order to reduce the proportion of 
nonrespondents. Successful as such efforts 
usually are, the careful investigator is still 
obliged to describe the inevitable residue of 
nonrespondents, inasmuch as they continue 
to pose the possibility of prejudicing the 
sample. 

The surest and most adequate method of 
accounting for nonrespondents to mail ques- 
tionnaires is by personal contact and inter- 
view. In practice this is rarely done, since 
even the pursuit of only a sample of nonre- 
spondents by telephone or intensive mail bar- 
rage is costly and time consuming (1, 4). 
Several investigators (2, 3, 8) have attempted 
to devise a shortcut to obtaining the unknown 
characteristics of nonrespondents. They em- 
ployed, or suggested, the use of the several 


1 The author is indebted to Harry M. Henkin and 
Stephen J. Zolczynski for their help in carrying out 


the statistical computations. is 
2 Now at Georgetown University School of Medi- 


cine. 


coordinate values of the initial response group 
and of follow-up response groups to extra- 
polate any obtained trend to unknown values 
for the group of nonrespondents. 

A rationale for this procedure can be elabo- 
rated. The successive subgroups within a 
sample which range from the initial response 
group, through the various groups which re- 
sponded to an ever-increasing number of fol- 
low-up communications, to the group of non- 
respondents may be said to have a decreasing 
probability of response and an increasing 
probability of nonresponse. This relation- 
ship may be expressed symbolically, begin- 
ning with the initial response group and end- 
ing with the nonresponse group, as 


pqg" + prg" + pr?qn? + prg" EN 
+ pri n-a grtn- +g” 


For the residual nonresponse group (g"*"), 
the probability of nonresponse is given as 
unity. The above expression is suggestive of 
one-half of the binomial expansion. The fit- 
ting of curves derived from empirical data 
can determine the applicability of this for- 
mulation. 

Any factor which varies systematically with 
the probability of response-nonresponse may 
presumably be extrapolated to the nonre- 
sponse group from the set of coordinates fur- 
nished by the successive groups of respond- 
ents. Even on purely theoretical grounds 
this does not imply that all factors can be ac- 
curately predicted. Some factors may be un- 
related to the résponse-nonresponse variable, 
others may be subject to large random varja- 
tions; for still others the trend curve may 
contain unexpected inflections in the area to 
be extrapolated. If a factor is systematically 
related to the response-nonresponse variable, 
curves which represent this relationship and 
which have been derived from a number of 
independent samples would be expected to 


117 


118 


follow a common pattern. By the same 
token, the absence of a common pattern for 
any particular factor would indicate the lack 
of a stable relationship, which would elimi- 
nate the response-nonresponse continuum as 
an adequate basis for projecting that factor 
to the nonresponse group. 

A definitive answer to the question of which 
and how many variables can be extrapolated 
with what degree of certainty to a nonre- 
sponse group must come from an evaluation 
of the identical set of factors for a series of 
samples. This is a research task of consid- 
erable proportions. The present experiment 

. is of a more limited scope. Nevertheless, it 
can provide a test of the fundamental hy- 
pothesis. 


Herbert Zimmer 


Hypothesis. The response-nonresponse 
probability function indicates the presence 
and direction of nonresponse bias. 


Procedure 


During the first two weeks of February 1954, a 
questionnaire was sent by registered mail, with re- 
turn receipt requested, together with an explanatory 
letter and return envelope, to 220 U. S. Air Force 
officers and airmen. The questionnaire concerned an 
important life experience in which each of the 220 
addressees had recently participated. About one 
month after the mailing of the questionnaires, 166 
of the questionnaires had been returned. At that 
time, early in March, a follow-up letter was sent to 
the 54 nonrespondents. Between the mailing of the 
follow-up letter and the first week in April an addi- 
tional 26 questionnaires were returned. Four of 
these questionnaires were returned so soon after the 


Table 1 
Distribution of High-, Medium-, and Low-Response Probability Groups on Seven Variables 


Group 
All 
High Medium Low Subjects 
Characteristic (V=170) (N =22) (N=28) (N= 220) 
1. Age Mean 29.00 27.84 26.82 28.61 
SD 4.06 4.06 5.43 4.33 
2. Education Mean 14.73 13.91 13.71 14.52 
SD 1.69 2.06 2.09 1.83 
3. Rank Mean 8.20 7.77 6.41 7.93 
SD 2.10 2.19 3.07 2.34 
4. Years of service Mean 6.55 5.33 5.35 6.28 
SD 3.27 3.28 3.53 3.35 
5. Marital status 
Single f 54 6 15 75 
Married F 100 15 10 125 
Undetermined* f 16 1 3 20 
6. Military status 
Regular f 60 7 15 82 
Reserve Í 109 15 il 135 
Undetermined* f 1 0 2 3 
7. Occupation 
Pilot and copilot $ 91 12 8 111 
Bombardier and navigator f 22 1 4 27 
Gunner ef: 29 5 8 42 
Radio, radar and recon specialists oF 14 2 2 18 
Crew chief Fe 8 1 4 11 
Other f 6 1 4 11 


* Not included in computations for Table 2, 


x 


| 


Validity of Extrapolating Nonresponse Bias 


119 


Table 2 
Group Differences Between High-, Medium-, and Low-Response Probability Groups on Seven Variables 


Differences between Groups 


Over-all High vs. low High vs. medium Medium vs. low 
Characteristic df F $ CR $ CR p t ? 
1, Age 2/217 3.50 <.05 2.00 <.05 1.62 <15 15 <.50 
2, Education 2/217 5.21 <.01 2.41 <.02 1.75 <.10 33 >.70 
3. Rank 2/217 7.40 <.001 2.92 <.01 86 <.40 1.79 <,10- 
4, Vearsofservice 2/169 1.83 — — 
df GEKA RERE A EE l ARE AUE 7 Afai EARR, 
5. Marital status 2 6.50 <.05 1 5.64 <.02 1. 35 >.50 1 454 <05 
6, Military status 2 5.08 <.10 1 468 <.05 Al >.70 1 3.21. <.10 
7. Occupation 10 12.03 >.20 oe a = 


mailing of the follow-up letter that the respondents 
could not have had time to receive the follow-up 
letter before returning their questionnaires. These 
four questionnaires were therefore grouped with the 
166 earlier respondents, to give a total of 170 re- 
spondents, and 22 late respondents. Eleven months 
after the mailing of the questionnaires, 28 addressees 
still had not responded. Of these 28 questionnaires, 
two were returned by the Post Office as being un- 
deliverable or unacceptable to the addressee. All en- 
velopes carried a legible return address and were 
registered mail. 

This results in the following breakdown by num- 
ber of subjects in three categories. 


1. Questionnaires returned before receipt of fol- 
low-up letter by respondent: high response 
probability 170 

2. Questionnaires returned after receipt of fol- 
low-up letter by respondent: medium response 


probability 22 
3. Questionnaires not returned: low response 

probability 28 

Total questionnaires mailed 220 


A number of personal history items were available 
in military records for most of our subjects. The 
three groups, high response probability group (N= 
170), medium response probability group (N =22), 
and low response probability group (N = 28), were 
compared with each other on seven variables: (a) 


rank, $ 
status, (f) military status, and (g) military occupa- 


tional specialty. 
Results 


The first four of these variables can be 
represented as numerical distributions. Their 
means and standard deviations are shown in 
Table 1. Frequencies are shown for the last 


three variables, since information for these 
variables is of an unordered nature. 

On the first four variables the medium re- 
sponse probability group deviates from the 
high response probability group in the same 
direction as the low response probability 
group. On three of these four variables the 
medium response probability group occupies 
an intermediate position between the two 
other groups. On the remaining variable, 
years of service, analysis of variance does not 
indicate a significant difference between the 
three groups, as shown in Table 2. 

Table 2 shows the statistical significance of 
differences between the three groups. For 
each variable, an over-all test was done first. 
Tf it suggested that the three groups were not 
likely to be random samples from a common 
population, then the three groups were com- 
pared with each other, two at a time. At the 
top of Table 2 appear the four variables 
which yielded numerical distributions. For 
these, analysis of variance was used for the 
over-all test, CR and ¢ for the group-by- 
group comparisons. The lower part of Table 
2 shows the three variables for which cate- 
gorical information was available. Chi-square 
tests were applied to these data. 

Variable 7, military occupational specialty, 
shows no significant differences between 
groups, even though the small cell frequencies 
lead to an overestimation of existing differ- 


ences. 
The first three variables shown in Table 2 


120 


—age, education, and rank—are consistent 
with the hypothesis, though owing, perhaps, 
to the limited number of cases, all of the 
group-by-group comparisons do not attain 
significance. 

Variables 4 and 7, years of service and mili- 
tary occupational specialty, are also consist- 
ent with the hypothesis. For these two vari- 
ables there is no significant nonresponse bias, 
and none is indicated by the response-nonre- 
sponse probability function. 

Variables 5 and 6 are not consistent with 
the hypothesis. While a significant nonre- 
sponse bias exists here, the medium response 
probability group fails to indicate this, and 
is itself closer to the high response probabil- 
ity group than to the low response probabil- 
ity group. 

Of the seven variables against which the 
hypothesis was evaluated, five are found to 
be consistent with it, and two not consistent 
with it. Inasmuch as these tests of the hy- 
pothesis are based on the same sample and 
may not be entirely independent of each 
other, it is not feasible to name the odds of 
getting this particular combination of agree- 
ments and disagreements with the hypothesis. 


Discussion 


The present data give some support to 
the hypothesis that the response-nonresponse 
probability function indicates the presence 
and direction of nonresponse bias. Any ac- 
ceptance of this hypothesis must be qualified 
by the obvious exceptions to it. Though this 
method offers the advantage of being more 
economical and less time-consuming than al- 
ternative methods, it should be recognized 
that it is also likely to be less accurate, which 
may more than offset its advantages. Indi- 
cations from this study and from similar 
studies (2, 6) suggest that the use of the 
method considered here may introduce errors 
in estimating the extent and even the presence 
of nonresponse bias for some variables. As 
stated above, larger studies are needed to es- 
tablish more conclusively the predictability 
of nonresponse bias by means of this method. 
In the meantime, exclusive reliance on this 
method to account for all possible aspects of 
nonresponse bias must be suspect. 


Herbert Zimmer 


Evidence from the few other studies avail- 
able on this topic show remarkable agree- 
ment with the present results, Edgerton, 
Britt, and Norman (5) in a mail question- 
naire study of contestants in the First An- 
nual Science Talent Search report data show- 
ing a step-wise decrease in mean Science 
Aptitude Examination scores for three groups, 
totaling 906 subjects: initial respondents, re- 
spondents to a follow-up letter, and nonre- 
spondents. Baur (2) reports the results of a 
mail survey by the Veterans Administration 
of 6,000 veterans eligible for education under 
the G.I. Bill. Data on five variables are given 
for five separate groups: early initial respond- 
ents, late initial respondents, respondents to 
first follow-up letter, respondents to second 
follow-up letter, and nonrespondents. The 
groups show a gradual and progressive de- 
crease in formal education and in the propor- 
tion of subjects with definite educational 
plans, though data for nonrespondents are 
not available on the educational plans vari- 
able. No significant difference between groups 
is reported for age and for parenthood. The 
groups show a gradual increase in the pro- 
portion of married subjects, with a reversal 
of this trend by nonrespondents, who are 
closest to the early initial respondents on this 
variable. Ford and Zeisel (6) analyzing re- 
turns from a mail questionnaire to 382 for- 
mer employees of a company, in terms of 
initial respondents, respondents to first fol- 
low-up, and respondents to second follow-up, 
found that nonrespondents actually had a 
somewhat lower proportion of employees rated 
unsatisfactory by their supervisors than these 
investigators were led to expect by the highly 
consistent trend established from the three 
other groups. Phillips (10) compared initial 
respondents, follow-up respondents, and non- 
respondents in a group of 93 Fisk University 
alumni on four factors—year of graduation, 
sex, marital status, and parenthood—and re- 
ports no statistically significant differences 
between his groups on any of these four fac- 
tors. As does the present experiment, these 
studies demonstrate the predictive ability of 
the response-nonresponse probability func- 
tion for a number of variables, and also its 
failure in the case of other variables. 


‘ 


: 


| 


Validity of Extrapolating Nonresponse Bias 


Owing to availability of data on nonre- 
spondents, experimental evidence is mostly 
limited to a number of characteristics which 
as a rule are not themselves the topic under 
study in mail questionnaires, but there is 
every indication that questionnaire items 
show a similar distribution along the re- 
sponse-nonresponse continuum (6). 


Summary 


This experiment attempted to study the 
accuracy with which nonresponse bias in mail 
questionnaires can be extrapolated from the 
trend derived from the several coordinate 
values of initial and follow-up response groups 
for any given variable. This method is much 
more economical of time and effort than an 
interview sampling of nonrespondents. Data 
on seven variables were obtained for three 
groups, totaling 220 subjects and represent- 
ing three points on a response-nonresponse 
probability dimension. These data were em- 
ployed to evaluate the hypothesis that the 
response-nonresponse probability function in- 
dicates the presence and direction of nonre- 
sponse bias. The results indicate that of the 
seven variables considered, five were found to 
be consistent with the hypothesis and two not 
consistent with it. Furthermore, nonresponse 
bias occurred despite a very high rate of re- 
turn. These results were in keeping with the 
findings of other investigators. Large-scale, 
definitive studies on the predictability of non- 


121 


response bias for all relevant variables by 
means of this method have yet to be con- 
ducted. It was concluded that in the ab- 
sence of such studies, exclusive reliance on 
this method to account for all possible as- 
pects of nonresponse bias must be suspect. 


Received June 20, 1955. 


References 


1. Barnette, W. L. The non-respondent problem 
in questionnaire research. J. appl. Psychol., 
1950, 34, 397-398. 

2. Baur, E. J. Response bias in a mail survey. 
Pub. Opin. Quart., 1947, 11, 594-600. 

3. Clausen, J. A, & Ford, R. N. Controlling bias 
in mail questionnaires. J. Amer. statist. Ass., 
1947, 42, 497-511. 

4. Deming, W. E. On a probability mechanism to 
attain an economic balance between the re- 
sultant error of response and the bias of non- 
response. J. Amer. statis. Ass., 1953, 48, 
743-772. 

5. Edgerton, H. A., Britt, S. H, & Norman, R. D. 
Objective differences among various types of 
respondents to a mailed questionnaire, Amer. 
sociol. Rev., 1947, 12, 435-444. 

6. Ford, R. N., & Zeisel, H. Bias in mail surveys 
cannot be controlled by one mailing. Pub, 
Opin. Quart., 1949, 13, 495-501. 

7. Norman, R. D. A review of some problems re- 
lated to the mail questionnaire technique. 
Educ. psychol. Measmt, 1948, 8, 235-247. 

8. Pace, C. R. Factors influencing questionnaire 
returns from former university students. J 
appl. Psychol, 1939, 23, 388-397. 

9. Parten, Mildred. Surveys, polls, and samples, 
New York: Harper, 1950. 

10. Phillips, W. M. Weaknesses of the mail ques- 
tionnaire. Sociol. soc. Res., 1951, 35, 260-267. 


The Journal of Applied Psychology 
Vol, 40, No. 2, 1956 


A Biographical Inventory for Students: 
II. Validation of the Instrument 


Laurence Siegel 


Miami University 1 


The first article in this series described the 
development and standardization of a self- 
administering biographical inventory suitable 
for administration to male high school seniors 
and college freshmen (6). This Biographical 
Inventory for Students (BIS) yields scores on 
10 subscales which are relatively homogene- 
ous, independent, and reliable. These sub- 
scales are designated: Act—Action; Soc—So- 
cial Activities; Het—Heterosexual Activities; 
Rig—Religious Activities; LMA—Literature, 
Music, and Art; Pol—Political Activities; 
SEL—Socioeconomic Status; Eco—Economic 
Independence; Dep—Dependence upon the 
Home; Con—Social Conformity. 


Problem 


‘This paper describes a series of investiga- 
tions designed to ascertain the usefulness of 
the BIS as an adjunct to the counseling of 
students comparable to the standardization 
groups. Since the BIS items are factual in 
nature, they appear to be distinctly superior 
to items of the “I think . . .” or “I feel . . .” 
variety, thereby combining some of the ad- 
vantages of personal history forms with those 
of personality inventories. Validation of the 
10 BIS subscales, however, is prerequisite to 
appraisal of the usefulness of this instrument. 

The investigations to be described related 
the subscales to four classes of criteria: schol- 
astic ability and achievement, personality in- 
ventories, values inventories, and vocational 
choice. Most of these criteria were derived 
from performance on psychometric instru- 
ments. Although such an approach to the 
validation of the BIS lacks finality, it does 
permit for some understanding of the inter- 
pretation to be accorded the subscale scores. 


1A large portion of the research described in this 
paper was completed while the writer was on the 
faculty of the State College of Washington, and on 
the staff of the Personnel Research Board of The 
Ohio State University. 


Procedure 


Data for the entire series of studies were obtained 
from the responses of three basic samples of college 
freshmen. 

Sample I: Entering freshmen at the State College 
of Washington (V=154). The BIS was included 
in a battery of instruments administered routinely 
during freshman orientation week. 

Sample II: Freshmen who had been in residence at 
the State College of Washington for one month prior 
to administration of the BIS (N = 334). These stu- 
dents were enrolled in a physical education course 
and were required to complete a variety of psycho- 
metric instruments as part of an integrated research 
program. 

Sample III: Superior freshmen at The Ohio State 
University (W = 66). The bases for classification of 
these students as “superior” consisted of placement 
at the seventy-fifth percentile or higher on The Ohio 
State Psychological Examination, high school grades 
in the upper third of their graduating class, and a 
history of participation in extracurricular activities 
while attending high school. 

The research design is schematized in Table 1, 
which summarizes the criterion variables and the 
sample or subsample employed for each analysis. 


Results 


The statistically significant findings result- 
ing from these investigations (with the excep- 
tion of those utilizing vocational choice as the 
criterion) are summarized in Table 2. Any 
interpretation of these findings must be tem- 
pered by considerations of size and repre- 
sentativeness of the sample. 


Criteria of Ability and Achievement 


Although the BIS was not intended to be 
a test of scholastic ability, it seemed reason- 
able to expect that some of the subscales 
might correlate with achievement as defined 
by course grades. As expected, the BIS sub- 
scales (with the exception of Act) are not 
useful for the prediction of ability as defined 
by ACE total score. Three of the subscales, 


8 Analyses based upon this sample were conducted 
by the staff of the Occupational Opportunities Serv- 
ice, The Ohio State University. 


122 


Biographical Inventory for Students: II. 


however, are significantly correlated with the 
criterion of GPA. 

The regression of Dep upon GPA was 
curvilinear (eta = .48) with a tendency for 
high grades to be earned at both ends of the 
Dep continuum. If we accept the premise 
that students earning mid-range Dep scores 
are in the process of establishing their inde- 
pendence from familial influence, it then fol- 
lows that such students would neither be as 
secure as those who have already completed 
the process of “psychological weaning” nor as 
secure as those who have not yet started on 
it. Such insecurity might be expected to re- 
sult in low grades. Of course, such an inter- 
pretation is extremely tentative and awaits 
further substantiation. 

Another study relating BIS subscales to 
measures of intellectual ability and achieve- 
ment was conducted with Sample II. Al- 
though these data are based upon a highly 
selected sample constituting one tail of the 
distributions of ability and achievement, there 
are two points of agreement between this 
study and the one discussed previously: Act 
correlates most significantly (in a negative 
direction) with the test of ability; and Het 
correlates negatively with the criteria of 
grades. This latter correlation is of particu- 
lar interest in view of the finding that the 
correlation between OSPE and PHR for this 
truncated sample was only .33. 

The relationships cited in Table 2 are con- 
gruent with logical expectations. The better 
students appear. to participate in fewer 
heterosexual, physical, and social activities 
than do the less capable. In summary, the 
picture is one of scholastic achievement ob- 
tained, to some extent, at the expense of non- 
intellectual pursuits. 


Personality Inventories 


Two personality inventories served as vali- 
dation criteria for the BIS. The first of these, 
the Heston Personality Inventory (3) pro- 
«vides six scores designated: 


Analytical Thinking—“thinking introversion” and 
“intellectual independence” ; 

Sociability—making friends easily and taking the 
lead in social participation; 

Emotional Stability—ability to remain in stable 
and uniform spirits and to relax and avoid tension; 


123 


Table 1 


Summary of Criterion Variables and Samples 


oe Sample 
Criterion (orSubsample) N 


1. Scholastic ability and 


achievement 

American Council on Education 

Examination (ACE) i 154 
Grade-point average (GPA) after 

two semesters I 154 
Ohio State Psychological Exami- 

nation (OSPE) II 66 
Point-hour ratio (PHR) at end 

of first quarter U1 66 
Rank in high school graduating 

class Tl 66 


2. Personality Inventories 
Heston Personal Adjustment In- 


ventory A 107 
Runner Personality Analysis mm 66 
3. Values Inventories 
Allport-Vernon Study of Values TB 85 
Lurie Values Inventory Tt 66 


4. Vocational Choice 
Major field of study TIC 185 


RUF Saran a Oye 


Confidence—ability to make decisions readily and 
confidently ; 

Personal Relations—feeling that other people are 
trustworthy and congenial; 

Home Satisfaction—pleasant family relations. 


All but one of the obtained correlations 
were compatible with our prior expectations. 
The unexpected relationship was that between 
LMA and Sociability. Perhaps the most 
parsimonious interpretation in the absence of 
corroborating evidence is to consider this cor- 
relation a result of the type of sampling error 
liable to occur whenever all variables in a 
matrix are intercorrelated. In addition, two 
relationships were hypothesized but were not 
substantiated by the data: Het with Confi- 
dence, and Soc with Personal Relations. 

The curvilinear regression of Dep upon 
Heston’s Confidence score yielded an eta of 
.55. Low Dep scores tended to be associated 
with mid-range Confidence scores, whereas 
high Dep scores were associated with the two 
extremes of the Confidence scale. The find- 


124 Laurence Siegel 


Table 2 


Summary of Significant Correlations 


Between the BIS and External Criteria 


Criterion Act Soc Het 


Rig LMA Pol SEL Eco Dep Con 


Ability and Achievement 
ACE —.22"* 
GPA (after2sem.) —.25*** —.25*** 
OSPE S233 188-130 8F P= 268 


PHR (after 1 qtr.) —.36"* — 45+ 
H.S. rank —.34* 


Personality Inventories 
Heston: 
Analytic + 
Sociability Om ea oo te 
Confidence 123* .20* 
Home Satisfaction 
Runner: 
Conventionalism 
Practical Attitude 
Methodical 
Thing Interest 
Human Welfare 
Sociability ons agains je 
Exhibitionism 33t .30* 
Affectional 
Expression 31* 
Dependence sont 
Wish for Dominance JOTTA OOE 
Aggressiveness .28* Ake! ad 
Hostility 
Values Inventories 
Allport-Vernon: 
Aesthetic 
Social 
Political Lot .26* 
Religious 
Economic 
Theoretical 
Lurie: 
Aesthetic 
Social .26* 
Political Agee 
Religious .28* 
Economic Site 
Theoretical 


E tand (.48)t 
—.28* 


—.34e** 


3g* 30% 
128" 36st 21° 

25%* (.55)t 

22 


34t 32*** 31 
32 
.28* wae 
3400" 
.29* 
.30* 


.25* 


.29* .24* 
ea bes 


.25* 


3i* 


3g" 
—,.32** 


”* 
E vigad 
-32+ .26* 31** 
.30* 
LSet 26" 31%" 


F* > <.05. 
** p < 02, 

eK D < 01. 
feta. 


ing that students who are overly dependent 
upon the home (high Dep score) may either 
lack confidence or be overly confident seems 
quite plausible. If we assume that verbaliza- 


tions of lack of confidence or of overconfi- 
dence may be symptomatic of poor adjust- 
ment, we might then expect overdependence 
to be part of the total configuration. Such an 


Biographical Inventory for Students: II. 125 


expectation does not imply causality. In- 
deed, the lack of confidence (either verbal- 
ized or camouflaged by bravado) and the de- 
pendent relationship may both be reflections 
of a more basic causal agent. 

The second criterion of personal adjust- 
ment was the Runner Personality Analysis 
(5) which provided 13 criterion scores de- 
fined as: 


Conventionalism—response to what ought to be; 
Practical Attitude—concern with utilitarian and 


material values; 

Methodical—pre-planned, systematic performance 
of activities; 

Things Interest—pleasure in hand skill activities; 

Human Welfare—the desire to be of social service; 

Sociability—desire to be with people; 

Exhibitionism—desire to be the center of atten- 
tion; 

Affectional Expression—the need for warm per- 
sonal relationships with others; 

Dependence—the need for help and attention from 


others; 

Wish for Dominance—love of power and au- 
thority over people; 

Competitiveness—stimulation of effort by the de- 
sire to win out over others; 

Aggressiveness—the active expression of dissatis- 


faction with other people; 

Hostility—passive resentment ; distrust of others. 

This analysis was conducted without the 
formulation of prior hypotheses in view of the 
fact that the basis of interpretation of the 
Runner is the configuration of subtest scores 
rather than any given score per se. Never- 
theless, a number of significant correlations 
which are readily explicable, and might have 
been predicted, appear in the matrix. 


Values Inventories 

The BIS subscales were related to scores 
on the Allport-Vernon Study of Values (1) 
and the Lurie Values Inventory (4). Both 
of these criterion instruments yield scores on 
the six variables classified by Spranger (7). 
Four consistent relationships appear in both 
the Allport-Vernon and the Lurie: Soc—Po- 
litical Values; R/g—Religious Values; IMA 
—Aesthetic Values; and Pol—Political Values. 
Aside from these consistencies, it appears that 
the BIS bears greater resemblance to the 
Lurie than to the Allport-Vernon. 


Vocational Choice 


The 334 students in Sample II were classi- 
fied on the basis of major field of study. 
Three of the subgroups-by-major contained 
more than 30 subjects and were therefore 
considered suitable for further analysis: 
Business Administration (W = 34), Agricul- 
ture (N = 51), and Engineering (N = 97). 
Mean BIS subscale scores for these groups 
were compared by computation of ¢ ratios for 
significance of difference. Three of the sub- 
scales differentiated between these groups. 
(a) Business Administration majors, as a 
group, scored higher than Agriculture ma- 
jors ( < .001) and Engineering majors ($ 
< .05) on Soc. Furthermore, Engineering 
majors scored significantly higher on this 
subscale than did majors in Agriculture (p 
<.01). (b) Students majoring in Agricul- 
ture scored higher on Rig than did Engineer- 
ing students (p< .001). (c) Majors in 
Business Administration had a higher mean 
score on LMA than majors in Agriculture 
(p < .05). The direction of these differ- 
ences lends support to the validity of the in- 
strument. 

Discussion 


We have cited correlations between the 
BIS subscales and 10 criteria, four of which 
yielded multiple scores. Collation of the 
validities obtained for each subscale will 
serve to clarify the significance and inter- 
pretation of scores on the BIS. 

Act—Action. This subscale is negatively 
related to intellectual ability and achieve- 
ment, except in the case where PHR for a 
truncated sample served as the criterion. 
High Act scores correlate with several cri- 
teria which may be broadly classified as “so- 
cial” and “striving”: i.e., Confidence on the 
Heston; Sociability, Wish for Dominance, 
and Aggressiveness on the Runner; Political 
on the Allport-Vernon. 

The significant correlation between Act and 
Religious Values on the Lurie is inexplicable 
in the light of the total configuration. 

Soc—Social Activities. Although this scale 
does not correlate with either intellectual 
ability or achievement when the full criterion 
range is available, it is negatively related to 


126 Laurence Siegel 


these criteria for the sample of superior stu- 
dents. It appears that the best of the su- 
perior students either are not interested in or 
haven’t the time for such social activities. 

Soc scores are strongly correlated with the 
indices of interest in people, either as ends in 
themselves or as means to gratification of 
power strivings, provided by all of the cri- 
terion instruments. The validity of this sub- 
scale is further supported by the fact that the 
students majoring in Business Administration 
score extremely high when compared to stu- 
dents majoring in agriculture and engineering. 

Het—Heterosexual Activities. Scores on 
this variable correlate negatively with criteria 
of scholastic achievement and positively with 
criteria of “sociability” as defined by the 
Heston and Runner. Furthermore, this sub- 
scale is related to Runner’s Exhibitionism and 
Aggressiveness scales. 

Rig—Religious Activities. Significant cor- 
relations appeared when Rig was related to 
indices of attitudes which might be classified 
as “moralistic” and “interest in social wel- 
fare activities”; e.g., Conventialism and Hu- 
man Welfare on the Runner, Religious values 
on both the Allport-Vernon and the Lurie, 
and Social values on the Lurie. 

The negative relationship between Rig and 
Theoretical values on the Allport-Vernon is 
somewhat substantiated by the fact that En- 
gineering majors scored lower on this sub- 
scale than did majors in Agriculture. 

LMA—Literature, Music and Art. The 
primary criteria for validation of this sub- 
scale was “Aesthetic Values” as defined by 
the Allport-Vernon and the Lurie, Signifi- 
cant correlations were obtained in both cases, 
These validities are supported by the fact 
that LMA correlated with Heston’s Analytic 
scale (which is defined as “thinking intro- 
version”). 

Pol—Political Activities. This BIS sub- 
scale correlates with Political Values as de- 
fined by both the Allport-Vernon and the 
Lurie. It is also related to assorted criteria 
of a methodological and analytic approach to 
problems. 

SEL—Socioeconomic Status. Contrary to 
our prior hypotheses, SEL did not correlate 
with Economic values. However, this sub- 


scale is clearly related to concerns of utility 
and practicality as well as to caution and 
adherence to tradition as measured by the 
Runner. 

A convincing demonstration of the validity 
of SEL scores was cited in the first paper in 
this series (6) wherein college freshmen were 
reported to score significantly higher than 
high school seniors. 

Eco—Economic Independence. Our origi- 
nal expectation was that this subscale would 
correlate with Economic values of the type 
measured by values inventories. Since this 
expectation was not supported by the data, 
we must conclude that economic independ- 
ence as measured by the BIS is unrelated to 
placing high value upon monetary goals. 
However, this subscale does yield readily ex- 
plicable correlations with Heston’s Confi- 
dence score and Runner’s index of pleasure 
in hand-skill activities. 

Dep—Dependence upon the Home. Sev- 
eral relationships between this predictor and 
criteria of so-called “wholesome adjustment” 
were obtained: i.e., Home Satisfaction on the 
Heston; Conventialism, Affectional Expres- 
sion, and Sociability on the Runner; Re- 
ligious and Social values on the Lurie, 

Con—Social Conformity. Scores on Con 
are related to indices of a conventional and 
methodical approach to matters as defined 
by the Runner, and to altruistic values (So- 
cial and Religious) as measured by the Lurie. 


Received October 1, 1954. 


References 


1. Allport, G. W., & Vernon, P. E. A study of 
values. Boston: Houghton Mifflin, 1931. 

2. American Council on Education. Psychological 
examination for college freshmen. Princeton: 
Educational Testing Service, 1948. 

- Heston, J. C. Personal adjustment inventory. 
Yonkers, N. Y.: World Book, 1949, 

4. Lurie, W. A. A study of Spranger’s value types 
by the method of factor analysis. J. soc. 
Psychol., 1937, 8, 17-37. 

5. Runner, Jessie R., & Runner, K. A gestalt analy- 


p 


sis of personality. Unpublished manuscript, - 


Iowa City, 1952. 

6. Siegel, L. A biographical inventory for students: 
I. Construction and standardization of the in- 
strument. J. appl. Psychol., 1956, 40, 5-10. 

7. Spranger, E. Types of men. New York: Stechert, 
1928. 


EE 


The Journal of Applied Psycholo 
Vol. 40, No. 2, 1956 yamine 


Equivalence of Forms of the Wonderlic Personnel Test: A 


Study of Reliability 


H. B. 


Universit 


and Interchangeability 


Weaver 
y of Hawaii 


and C. A. Boneau 
Duke University 


The Wonderlic Personnel Test is one of the 
most widely used group tests of general in- 
telligence. It was designed especially for in- 
dustrial use, is short (12-minute time limit), 
virtually self-administering, and easy to score. 
Industrial norms are available on approxi- 
mately 37,000 cases. It is published in five 
forms, A, B, D, E, and F—Forms D, E, and 
F consisting of items selected from the Otis 
Higher, Forms A and B consisting of similar 
items but of a wider variety. 

The five forms are claimed by the author 
of the test to be equivalent and interchange- 
able, and the published norms accordingly 
are not differentiated by form. This paper 
presents evidence, however, that there are 
substantial differences in difficulty among the 
forms and that they are by no means inter- 
changeable. 

I 


The study began with a general impression that 
Form A was yielding higher scores than Form B in 
routine use in a large industrial concern, A check 
on the records of nearly 300 cases in which the two 
forms had been given at random to supervisory 
candidates showed that Form A seemed to be dis- 
tinctly easier than Form B, the mean difference be- 
ing 3.3 score points. This corresponds to a differ- 
ence of from 10 to 20 centile units in the average 
range, varying somewhat with the norms used. 


II 


This suggestive finding led to the following 
study: 30 students comprising a class in psy- 
chology were given both Form A and Form 
B, half getting A first and half B first. It 
was found that the group averaged 1.97 more 
correct answers on Form A than on B. The 
difference yields a ¢ ratio of 2.61, which is 
significant at the 2% level for 29 df. The 
previous results were therefore confirmed. 


HI 


Although these independent findings rather 
clearly indicated that Form A is easier than 
B, the question of differences in difficulty 
among all five forms remained unanswered. 
Accordingly a further study was undertaken, 
Seventy Ss, comprising two classes in psy- 
chology, were given all five forms in counter- 
balanced order as shown in Table 1. This 
“unit” design for five Ss was replicated 14 
times, making a total of 70 Ss. 

Analysis of the resulting data by forms 
yielded the mean scores and SD’s shown in 
Table 2. It will be noted that the difference 
between Forms A and B is 1.95, almost ex- 
actly the same as the 1.97 obtained previ- 
ously on a similar sample. The maximum 
difference is 5.07, between Forms B and F. 
The average difference is 2.34. 

Inspection of the means and SD’s of Table 
2 seems to indicate that the forms fall roughly 
into two groups, A and B comprising a group 
of greater difficulty and higher variability 
than D, E, and F. This accords with the 
history of the development of the test. 
Forms D, E, and F are made up of items 
selected from the Otis Higher, while A and 


Table 1 


Paradigm for Equalizing Practice and Fatigue Effects 
Among the Five Forms A, B, D, E, F 


Subject 

Order of 

Administration il 2 3 4 5 
ges 

First A BSD Monae 
Second Be) i) Sve anes 
Third Di Ee EA S 
Fourth E< F AB 
Fifth F A BD wees 


127 


128 


Table 2 
Means and SD’s of Scores on the Five Forms for 70 Ss 


Form A B D E F 
Mean 29.79 27.84 31.37 31.31 32.91 
SD 6.01 7.39 5.73 5.57 5.7g 


B were developed later and include types of 
items not found in the Otis. 

The data were subjected to analysis of 
variance as shown in Table 3. Individual 
differences were, of course, highly significant, 
yielding an F of 10.92. Form differences 
yielded an F of 20.15, also highly significant. 

Each of the 10 possible differences between 
forms was tested individually by computing 
a t ratio based on the error variance estimate 
of Table 3. The differences and the results 
of the é tests are shown in Table 4. Nine of 
the 10 differences are significant, two at the 
2% level, three at the 1% level, and four at 
the .1% level. 

It may be noted that the differences are 
not merely statistically significant but are 
also of practical significance because of the 
nature of the test and the recommended man- 
ner of its use. The Wonderlic is very com- 
monly used as a screening device, and criti- 
cal scores are set for employment and other 
purposes. If the forms are used interchange- 
ably it would therefore appear that meeting 
or failing to meet a given minimum score 
would in many cases depend on the particular 
form an applicant chanced to be given. This 
chance factor is as high as 5 points, which is 
10% of the total possible score range, and in 
centile units of the published norms may 
amount to 37% of the distribution (4, Table 
1, Norms for male college students). The 
average interform difference, 2.34, is 4.7% of 
the total possible score range and in centile 
units may be as much as 20% of the dis- 
tribution. In view of these variations in diffi- 
culty, separate tables of norms should be 
constructed for each form, or conversion for- 
mulas should be developed to permit com- 
parison of scores on different forms. 

Apart from variations in difficulty level, the 
question of varying reliability of the forms 


H. B. Weaver and C. A. Boneau 


may also be considered. In a study employ- 
ing Forms D, E, and F, Wright and Laing 
(6) presented evidence that the forms vary 
in reliability. The nature of the experimental 
design of the present study does not permit 
calculation of interform reliability coefficients 
for all Ss because of the indeterminate effects 
of the counterbalancing on the correlation of 
any two forms. Interform coefficients based 
on the first and second forms administered to 
each S$ may, however, be computed validly. 
In the light of the paradigm shown in Table 
1, it was possible to compute the product- 
moment coefficients shown in Table 5, each r 
being based on 14 cases. The corresponding 
standard errors of measurement are based on 
the average of the SD’s of the forms involved 
(using the values for the entire sample of 70 
shown in Table 2). 

The reliability coefficients and standard 
errors of measurement among Forms D, E, 
and F only may be compared with those re- 
ported by Wonderlic and Hovland (5), shown 
in Table 5 footnotes. Despite the restricted 
sample size, there is remarkably close agree- 
ment with the findings of Wonderlic and 
Hovland. Unfortunately no similar data in- 
volving Forms A and B have been published 
and no comparison with other studies is pos- 
sible for the remaining coefficients of Table 
5. However, to provide a check on the low- 
est coefficient, that of .63 for Forms B and 
D, these forms were administered to an inde- 
pendent similar sample of 28 cases. The re- 
sulting was .65, which when combined with 
the previous 14 cases yields an r of .64. This 


Table 3 


Variance Table for Form Difference Data 


Source of Sum of Variance 
Variation Squares df Estimate 
Rows: Individual 
differences* 9,626.1 69 139.5 
Columns: Form 
differences** 1,030.1 4 257.5 
Remainder: Error 3,525.9 276 12.78 
Total 14,182.1 349 


*F = 139.5/12.78 = 10,92; df = 69/276; P less than .001. 
“ F = 257.5/12.78 = 20.15; df = 4/276; P less than .001. 


Wonderlic Personnel Test 


would further indicate that the coefficients of 
Table 5 are stable. 

It may be noted that the 7's involving 
Forms A and B are lower than those involv- 
ing D, E, and F alone, and the standard 
errors of measurement are higher. This 
would add further weight to the view pre- 
sented above that Forms A and B appear to 
comprise a group having different properties 
from the group made up of D, E, and F. 
The small size of the samples on which the 
rs are based, however, makes interpretation 
tentative. 

Conclusions 

The Wonderlic forms vary in level of diffi- 

culty and cannot be regarded as equivalent 


Table 4 
Form Differences and #-Test Results * 
Forms Difference Probability 
 AandB LOST less than .01 
Aand D 1.58*** less than .01 
Aand E Wy ad less than .02 
A and F 3.12 less than .001 
Band D 3.53t less than 001 
BandE 3.47 less than .001 
B and F 5.07 less than .001 
Dand E 0.06 
Dand F 1.54** less than .02 
E and F 1.60*** less than .01 
Total 23.44 
Average 2.34t less than .001 


9 needed for significance at P .05 (276 df), 


needed for significance at P .02. 
P 0. 


11 
1.41 

1.57 needed for significance at 
2 


129 


Table 5 


Interform Reliability Coefficients and Standard 
Errors of Measurement 


Forms A-B B-D D-E E-F F-A 
r 0.79 0.63*  0.86** 0.87*** 0.80 
Tie 3.04 3.99 2.11 2.00 2.50 


+0.64 when combined with independent similar sample of 


28 cases (or = 0.093). 
*+ Compare with 0.85 and 0.88 reported by Wonderlic and 


Hovland (5). 
#6 Compare with 0.89 and 0.92 reported by Wonderlic and 


Hovland (5). 


and interchangeable. Separate tables of norms 
should be constructed for each form, or con- 
version formulas should be developed to per- 
mit comparison of scores. The forms appear 
also to vary in reliability. 


Received June 13, 1955. 


References 


1. Buros, O. K. (Ed.) The third mental measure- 
ments yearbook. New Brunswick, New Jer- 
sey: Rutgers Univer. Press, 1949. Pp. 341- 
350. 

2. McNemar, Q. Psychological statistics. New York: 
Wiley, 1949. 

3. Melton, A. W. The methodology of experimental 
studies of human learning and retention. ` Psy- 
chol. Bull., 1936, 33, 305-394. 

4. Wonderlic, E. F. Wonderlic Personnel Test 
Manual. Northfield, Illinois, 1945. 

5. Wonderlic, E. F., & Hovland, C. I. The person- 
nel test: a restandardized abridgment of the 
Otis S-A test for business and industrial use. 
J. appl. Psychol., 1939, 23, 685-702. 

6. Wright, J. H., & Laing, D. M. The time factor 
in the administration of the Wonderlic Per- 
sonnel Test. J. appl. Psychol., 1943, 27, 316- 
319. 


The Journal of Applied Psychology 
Vol. 40, No. 2, 1956 


Experimental Manipulation of the Halo Effect 


Donald M. Johnson and Robert N. Vidulich 
Michigan State University 


The halo effect was briefly noted in 1907 
by Wells (6) and christened in this Journal 
in 1920 by Thorndike (5). Since then nearly 
all discussions of ratings have listed this ef- 
fect as one of the errors to be avoided. Yet 
when the evidence was reviewed in 1945 (2) 
and again in 1955 (3), no convincing proof 
of the existence of halo could be found. The 
present report is offered as the first published 
verification of this 50-year-old concept. 

There is no doubt about the facts to which 
this luminous phrase is applied. When any- 
one is asked to rate several individuals on 
several traits, he will rate some individuals 
high on most traits and some low on most 
traits. If the ratings of the individuals on 
any two traits are correlated, the correlation 
will be positive and often quite large. 

There is some doubt about the interpreta- 
tion of the facts. Conventionally, the corre- 
lation between trait ratings is considered an 
error of judgment, at least in part. The rater 
has a generally favorable or unfavorable atti- 
tude toward each individual that influences 
his ratings of the individual on each trait. 
The favorable attitude is the one, of course, 
that gave halo its name, After the rater has 
cast a halo around his subject, he is so daz- 
zled by its radiance that he cannot differenti- 
ate the subject’s separate qualities. If the 
rater judged the individuals on each trait 
separately and analytically, as requested, the 
traits would not be correlated. 

The criticism of this interpretation is that 
the rater may be right. Perhaps the correla- 
tion between the traits is an objective fact 
rather than an error of judgment (2). Or at 
least it may be the information available to 
the rater rather than his judgment that is in 
error. The correlation between trait ratings 
may be due to an objective correlation, to 
correlation in the rater’s information, or to 
the rater’s inability to judge the traits sepa- 
rately, i.e., a halo effect. Statistical analysis 


of trait ratings will not by itself locate the 
source of the correlation. 

The common recommendation for avoiding 
the halo effect gives a clue to the proper 
method of investigation. Several writers, e.g., 
Symonds (4), have recommended rating all 
individuals on one trait at a time, rather than 
one individual at a time on all traits. Pre- 
sumably this will lessen confusion between 
traits and thus reduce the halo effect. Such 
a reduction in the halo effect would support 
the hypothesis that the effect resides in the 
judging process rather than in the objective 
facts, 

One reason why there is little evidence on 
halo is the number of correlation coefficients 
involved. If someone rates a number of in- 
dividuals on five traits, 10 intertrait correla- 
tions have to be computed. If 10 raters are 
used, 100 correlations are required. If the 
ratings are made under two conditions, the 
experiment involves 200 correlations. 

Fortunately, the more efficient analysis of 
variance has recently been applied to the en- 
tire treatment of ratings by Guilford (1), 
His elegant rationale for errors of rating sim- 
plifies the calculations and clarifies the inter- 
pretation of the results. He distinguishes be- 
tween two components of the halo effect, that 
common to different raters and that varying 
from rater to rater. If the ratees, or indi- 
viduals being rated, are not all rated equally, 
some getting higher ratings than others, the 
analysis discloses a significant variance be- 
tween ratees. This between-ratee variance is 
the more “objective” component of the halo 
effect in the sense that it is common to the 
different raters, who may all have been ex- 
Posed to the same information, If some raters 
tate some ratees higher than others, the analy- 
sis discloses a significant interaction vari- 
ance, which Guilford calls the relative halo 
effect. It is relative to the raters, traits, and 
Tatees at hand and is estimated in relation to 


130 


Experimental Manipulation of the Hi alo Effect 131 
Table 1 
Mean Ratings of Five Prominent Individuals on Five Traits by 18 College Students Under 
Conditions that Maximize the Halo Effect 
Intelli- Personal Kindli- Useful- 

Ratee gence appearance ness Courage ness Means 
Queen Elizabeth 8.50 7.94 7.67 7.33 6.17 7.52 
Senator McCarthy 8.06 6.00 3.67 8.22 5.28 6.25 
Sir Winston Churchill 9.17 6.17 8.00 8.44 8.89 8.13 
Mrs. Eleanor Roosevelt 8.17 6.00 7.72 7.22 6.28 7.08 
Pope Pius XII 9.17 7.50 8.56 8.56 8.50 8.46 

Means 8.61 6.72 7.12 7.96 7.02 7.49 

INTELLIGENCE 


deviations from the grand mean of all ratings 
by all raters. 

These two components of the halo effect 
calculated by analysis of variance are, of 
course, open to the same questions as the 
halo effect calculated by correlations between 
trait ratings. To prove that these halo ef- 
ects, or some part of them, are due to an 
error of judgment rather than to objective 
facts, we may experimentally manipulate the 
conditions of judgment, while holding infor- 


. mation about the facts constant, and com- 


pare the variances obtained under the differ- 


: ent conditions. 


Procedure 


Two groups of raters were used, one working un- 
der conditions designed to maximize the halo effect, 
the other working under conditions designed to 
minimize the halo effect. The ratees were five well- 
known living persons. The traits were five varied 
but familiar characteristics. Ratings were made on 
a scale from 1 to 10, with 10 being high. Since five 
ratings were made and the classes met two days a 
week, the experiment covered two and a half weeks. 
The classes were treated the same except as de- 


_ scribed below. 


The maximization group was given on the first 
day a list of traits and the name of one ratee, as 


` follows: 


QUEEN ELIZABETH 


Intelligence 
Personal appearance 
Kindliness 
Courage 
Usefulness 


They were told to rate this individual in respect 
to each trait. Examples were given. At the next 
class meeting they were given another ratee and the 
same list of traits. And so on for five ratees. 

The minimization group was given on the first day 
a list of ratees and one trait, as follows: 


Queen Elizabeth 
Senator Joseph McCarthy —— 
Sir Winston Churchill ——. 

Mrs. Eleanor Roosevelt 
Pope Pius XII 


They were asked to rate each individual in re- 
spect to this trait. Examples were given. At the 
next class meeting they were given another trait and 
the same list of ratees. And so on for five traits. 

All raters in each group made their judgments in 
the same order. These orders are shown in the lists 
printed above. 

The hypothesis was that rating one individual on 
all traits at once would increase the halo effects, and 
rating one trait at a time would reduce them. That 
is, the minimization group followed the procedure 
commonly recommended for reducing the halo effect. 

The raters were students of two classes in Ef- 
fective Study at Michigan State University. The 
maximization group began with 24, the minimiza- 
tion group with 23. Students who were absent from 
any of the five experimental days were eliminated, 
so both groups were reduced to 18 raters: 14 men 
and 4 women in the maximization group, 13 men and 
5 women in the minimization group. In respect to 
intelligence (ACE), college class, age, a rough meas- 
ure of family income, and religion, the groups were 
closely comparable. No explanation of the purpose 
of the experiment was given. The students were 
accustomed to simple classroom projects of this kind. 


Results 


In general, the ratings shown in Tables 1 
and 2 are about as expected. Sir Winston 
Churchill was rated very high on intelligence, 
Senator McCarthy very low on kindliness. 
The largest standard deviation was for Sena- 
tor McCarthy in respect to usefulness. The 
grand means were approximately the same 
under both conditions, as were the total vari- 
ances. 


132 Donald M. Johnson and Robert N. Vidulich 
Table 2 
Mean Ratings of Five Prominent Individuals on Five Traits by 18 College Students Under 
Conditions that Minimize the Halo Effect 
Intelli- Personal Kindli- Useful- 

Ratee gence appearance ness Courage ness Means 
Queen Elizabeth 6.67 8.94 744 6.72 6.28 7.21 
Senator McCarthy 6.78 6.89 4.00 7.83 6.06 6.31 
Sir Winston Churchill 9.06 6.44 6.56 8.33 8.72 7.82 
Mrs. Eleanor Roosevelt 7.83 6.28 7.78 6.78 6.33 7.00 
Pope Pius XII 8.06 7.72 8.17 8.33 7.83 8.02 

Means 7.68 7.25 6.79 7.60 7.04 


7.27 


The important results for our purposes are 
the two analyses of variance. Table 3 gives 
the results for the maximization group, 
Table 4 for the minimization group. Vari- 
ance due to raters is significant beyond the 
.01 level under both conditions. The largest 
variance is due to ratees or individuals being 
rated. This is the component of the halo ef- 
fect that is common to all raters. It is con- 
siderably smaller under the minimization con- 
dition. 

The relative halo effect is in the variance 
due to interaction between raters and ratees 
(R X I), shown in Part I of each table. This 


variance is small under both conditions, but 
under maximization conditions it is signifi- 
cant beyond the .01 level of confidence. Un- 
der minimization conditions it does not reach 
the .05 level. This verifies the long-standing 
assumption that part of the halo effect is a 
phenomenon of judgment and also shows that 
the common recommendation about removing 
halo is effective. When the traits were rated 
one at a time, two or three days apart, the 
relative halo effect was not significant. 

As a check on this difference between condi- 
tions, the data summarized in Part I of each 
table were put into an 18 x 5 X 2 design to 


Table 3 


Analysis of Variance of Ratings of Five Prominent Individuals on Five Traits by 18 College Students 
Under Conditions that Maximize the Halo Effect 


I. Ignoring Differences Between Traits 


Sum of Degrees of Variance 
Source Squares Freedom Estimate F P 
Between raters (R) 251.78 17 14.81 5.16 “OL 
Between ratees (I) 276.16 4 69.04 24.06 01 
Interaction (RXI) 365.68 68 5.38 1.87 .01 
Within sets 1032.80 360 2.87 
Total 1926.42 449 
II. Ignoring Differences Between Individuals 
Sum of Degreesof Variance 
Source Squares Freedom Estimate F P 
Between raters (R) 251.78 17 14.81 4.05 .01 
Between traits (T) 217.54 4 54.39 14.86 01 
Interaction (RXT) 138.70 68 2.03 56 n.s. 
Within sets 1318.40 360 3.66 
Total 1926.42 449 


Experimental Manipulation of the Halo Effect 


133 


Table 4 
Analysis of Variance of Ratings of Five Prominent Individuals on Five Traits by 18 College Students 


Under Conditions that 


Minimize the Halo Effect 


I. Ignoring Differences Between Traits 


Sum of Degreesof Variance 
Source Squares Freedom Estimate F P. 
Between raters (R) 243.14 17 14.30 4.04 01 
Between ratees (I) 167.99 4 42.00 11.86 01 
Interaction (RXT) 268.65 68 3.95 1:12 ns. 
Within sets 1273.60 360 3.54 
Total 1953.38 449 
II. Ignoring Differences Between Individuals 
Sum of Degreesof Variance 
Source Squares Freedom Estimate F P 
Between raters (R) 243.14 17 14.30 3.96 1 
Between traits (T) 50.19 4 12.55 3.48 01 
Interaction (RXT) 358.85 68 5.28 1.47 05 
Within sets 1301.20 360 3.61 
Total 1953.38 449 


permit calculation of the triple interaction, 
raters X ratees X conditions. This interaction 
was significant beyond the .01 level. The in- 
teraction between ratees and conditions was 


also significant beyond the .01 level. 


The interaction between raters and traits 
(R X T) shown in Part II of both tables, has 
not attracted as much attention as the halo 
interaction. This interaction indicates the 
tendency of some raters to give high ratings 
on some traits and low ratings on other traits. 
As Guilford points out, this interaction could 
include a contrast effect referred to the self. 
That is, someone might rate everyone low on 
courage because he considers himself unusu- 
ally courageous. Or, in reverse, someone 
might rate everyone high on kindliness be- 
cause he is a kindly person himself. What- 
ever is included in this rater-trait interaction, 
it is affected by our conditions in just the op- 
posite way from the halo interaction. It is 
significant at the .05 level under the condi- 
tions designed to minimize halo and is hardly 
present at all under the maximization condi- 
tion. 

As a check on this difference, the data 
summarized in Part II of each table were put 


into an 18 X 5 X 2 design to permit calcula- 
tion of the triple interaction, raters X traits X 
conditions. This interaction did not reach the 
.05 level of significance. 

Though this shift in R x T interaction be- 
tween conditions is not significant and was 
not expected in advance, it seems to fit in 
with the other results. When the rater judges 
one individual at a time on all traits, he man- 
ages to treat the traits in isolation but he 
gets personally involved with the individuals. 
Similarly, when the rater judges all individu- 
als on one trait at a time, he manages to 
treat the individuals separately but he gets 
personally involved with the traits. In re- 
ducing the relative halo or rater-ratee inter- 
action we have increased the rater-trait inter- 
action, in this sample at least. 

It is possible to state a generalization that 
encompasses both interactions. The over-all 
interaction is between raters and what they 
do on separate days. The rater can attend 
to, and judge separately, five items presented 
simultaneously, but when these items are pre- | 
sented a few days apart as aspects of a 
larger task, he cannot treat them separately. 

The third interaction, between traits and 


134 Donald M. Johnson and Robert N. Vidulich 


ratees, computed by ignoring differences be- 
tween raters, was significant beyond the .01 
level under both conditions. It seems to be 
of no particular importance to the present 
discussion. 

Limitations 

Though we have used the term “maximiza- 
tion,” we have no experience or data to indi- 
cate that relative halo variance in Table 3 
is really maximal. Our selections of ratees, 
traits, and rating conditions were based on 
current hypotheses about halo, with no data 
to guide these selections. It is possible that 
halo would be larger if the ratings were made 
at intervals of one day, for example, or one 
hour. 

The same considerations apply to the small 
interaction variance between raters and traits, 
and the small difference in this variance be- 
tween conditions. It is doubtful that much 
error of any practical importance enters these 
ratings because of this interaction. Under 
other conditions, of course, this interaction 
may be larger and may shift more. The ef- 
fect of the judging conditions should there- 
fore be considered when anyone uses a rater’s 
ratings to get information about the rater. 


Summary 


The halo effect is supposed to be due to an 
error of judgment but no evidence has been 
published to prove that correlations between 
trait ratings are due to errors of judgment 
rather than objective correlations. The cor- 
rect method of investigation is experimental 
variation of the conditions of judgment and 
comparison of halo effects manifested under 
the different conditions. 

One group rated five well-known individu- 
als on five traits under conditions designed to 
maximize the halo effect. They rated one in- 
dividual each experimental day on all traits. 
The other group rated the same individuals 


on the same traits under conditions designed 
to minimize the halo effect. They rated all 
individuals on one trait each experimental 
day. 

The relative halo effect was calculated, fol- 
lowing Guilford, as the variance due to inter- 
action between rater and ratee. Under maxi- 
mization conditions this interaction variance 
was significant beyond the .01 level. Under 
minimization conditions this interaction vari- 
ance was not significant at all. The between- 
ratee variance also was smaller under the 
minimization conditions. These results prove 
that halo is in part a phenomenon of judg- 
ment and that the common recommendation 
for reducing it by changing the conditions of 
judging is effective. 

When halo variance was reduced by this 
change in the judging task, variance due to 
interaction between raters and traits was in- 
creased, but this difference was not signifi- 
cant. 

In general, the rater attends to and dif- 
ferentiates the five items spread out in front 
of him whether they are names of individuals 
or traits. He interacts personally with these 
same items when they are presented at in- 
tervals of two or three days as aspects of a 
larger task. 


Received May 5, 1955. 


References 


1. Guilford, J. P. Psychometric methods. 
Ed.) New York: McGraw-Hill, 1954. 

2. Johnson, D. M. A systematic treatment of judg- 
ment. Psychol. Bull., 1945, 41, 193-224. 

3. Johnson, D. M. The psychology of thought and 
judgment. New York: Harper, 1955. 

. Symonds, P. M. Notes on rating. J. appl. Psy- 
chol., 1925, 9, 188-195. 

5. Thorndike, E. L. A constant error in psycho- 
logical ratings. J. appl. Psychol, 1920, 4, 
25-29. 

6. Wells, F. L. A statistical study of literary merit. 
Arch. Psychol., 1907, 1, No. 7. 


(2nd 


ES 


y 


The Journal of Applied Psychology 
Vol. 40, No. 2, 19. 


Comparison of Two Visual Display Presentations * 


A. V. Churchill 


Defense Research Medical Laboratories, Toronto, Ontario 


The applicability of the results of numer- 
ous studies of dial legibility is dependent on 
the assumption that the presentation of slides 
on a screen is equivalent to the presentation 
of the actual dials. The present experiment 
was conducted in order to establish the com- 
parability of the two modes of presentation. 


Method 


Apparatus. Twelve 100 X 2, black on white dials, 
three inches in diameter, designated as types “A” 
and “B” in a previous study (1), were arranged in 
two simulated six-dial panels, with a spring-loaded 
pointer mounted on each dial. The panels were pre- 
sented in a vertical frame, at a 30-inch viewing dis- 
tance. The panels were viewed through an aper- 
ture, fitted with a shutter which the subject (S) 
operated by means of two microswitches. 

Slides made from photographs of the panels were 
projected onto a vertical screen which presented 6- 
inch diameter dials at a 60-inch viewing distance. 
As with the panels, the slide exposures were con- 
trolled by S. 


Table 1 


Comparison of Slide and Panel Presentations 


Slides Panels 
Mean réading time per dial 1.46 sec. 1.42 sec. 
% of readings in error 3.92% 3.85% 
Mean error per reading +0.056 +0.048 
(in scale units) 
Errors of more than 10 20 13 


units* 


* Thirty-three errors of a magnitude greater than 10 were 
treated as errors of 10, i.e., 91 called 51, 91 called 9. 


1 Defense Research Medical Laboratories Report 
No. 164-2, Project No. D77-94-20-27 (H.R. No. 84). 


The ambient illumination was 20 foot-candles, the 
illumination at the display was 70 foot-candles dur- 
ing presentation. 

Subjects. The Ss were 50 University undergradu- 
ates, with normal vision, who were paid for their 
participation. 

Procedure. Forty-eight pointer positions were pre- 
sented to each S, twice on the panels and twice on 
the slides. In both test situations S was seated in 
front of the display and held a microswitch in either 
hand. At the “ready” signal S pressed the “on” 
switch, reported the six dial settings as quickly and 
as accurately as possible, and then pressed the “off” 
switch, Twenty seconds was allowed between the 
reporting of the readings on one presentation and 
the “ready” signal for the following presentation. 
Panels preceded or followed slides for alternate Ss, 
and a different random order of settings, on both 
slides and panels, was presented to each S. 


Results and Conclusions 


Reading time and error data were recorded. 
The results are presented in Table 1. 

As will be seen from the table, there is a 
slight, but insignificant advantage in favor of 
the panel presentation, the time advantage 
for panels showing a ¢ of .75. 

Analysis of the data which were gathered 
in the present study lends justification to the 
procedure of applying slide-projection data 
directly to the actual displays which the 
slides are intended to represent. 


Received May 9, 1955. 


Reference 


1. Churchill, A. V., & Allan, D. G. Experimental 
dial design. March, 1955. Defense Res. Bd., 
Canada, DRML Rep. No. 164-1. 


135 


“Keep up" with reviews of the best, 
current books in applied psychology— 


Read Contemporary Psychology 
The New APA Journal of Book Reviews 


Let CP provide you with the current, monthly news of books in the broad field of 
psychology, as well as in the various phases of applied psychology—now that the Journal 
of Applied Psychology is no longer publishing book reviews. 

—Every month CP publishes one long, critical, essay-type review of a particularly 
important psychological volume. 


—Every month CP presents many well-written, evaluative reviews of the foremost 
new volumes. 

—Every month CP includes valuable information on many phases of educational 
and industrial psychological films... the films themselves... film re- 
search . . . books about films. 


—Every month in “CP Speaks” the editor provides news and comment on psy- 
chology’s wide world of books. 


Read CP for the News in Books 
$4.00—to APA members; $8.00—to all others 


and participants in dd $.50 for foreign 
Student Journal Groups Se “a please) 


American Psychological Association 
1333 Sixteenth Street, N. W. Washington 6, D. C. 


rr 


VoL. 40, No. 3 


a This study investigates the relative eff- 
ciencies of a number of spatial positionings 
of a stimulus panel and a response keyboard 
used in a repetitive key-pressing tasks In 
designing display-control arrangements it is 
generally considered best to place the visual 
display directly in front of the operator’s 
eyes and to have the hand controls con- 
veniently centered in front of the operator 
and in direct stimulus-response correspond- 
ence with respect to the elements of the dis- 
play (3, 4, 5, 6, 7, 9). In complicated dis- 
play-control arrangements, as in an aircraft 
cockpit, the competition for the optimal space 
is critical, and it is frequently necessary to 
arrange display or control components in less 
than optimal positions and correspondence. 
It therefore becomes important to have some 
measure of the degradation of performance to 
be expected with suboptimal display-control 
arrangements. The present study compares 


sumably) optimal arrangement of a visual 
display and a finger-operated control board. 
The general problem of the location of the 
work space has been dealt with by time-and- 
motion study engineers and has been ade- 
quately summarized by Chapanis, Garner, 
and Morgan (2, pp. 331-364; 9). There 
have also been studies of location discrimina- 
tion (3), the compatibility or correspondence 
~ 4 This research was supported in part by the USAF 
under contract AF 18(600)-54 monitored by the Aero 
Aeda Laboratory of the Wright Air Development 
‘enter, Wright-Patterson AFB, Ohio, and by the 
Research Committee of the Graduate School of the 


University of Wisconsin. 
2 Now at the University of Delaware. 


eigkt suboptimal arrangements with the (pre-» 


+ 


y 


Journal of Applied’ Psychology 


JUNE, 1956 


The Influence of the Spatial Positioning of Stimulus and 
Response Components on Performance of a Repetitive 
Key-Pressing Task * 


Norman H. Anderson, David A. Grant, and Charles O. Nystrom ? 


University of Wisconsin 


> 


of responses to stimulus displays (5, 6, 7), 
and the effect of tilting key-pressing control 
panels (8). The present study involves com- 
parisons of various locations of the complete 
visual display and the complete set of finger 
controls in continuous and intermittent psy- 
chomotor performance. In the present ex- 
periment nine arrangements of the stimulus 
panel and response keyboard were used. The 
stimulus panel occupied the right, left, and 
front positions relative to the operator, and 
the response keyboard occupied similar po- 
sitions, independently. Time-and-error in- 
dices of operator efficiency were investigated 
as they were affected by the spatial position- 
ing of these two components. 


~ Method 


Apparatus. The Multiple Serial Discrimeter (MSD) 

used in this experiment is the same as that employed 

© 

in previous experiments (6, 7), except for the subg 
stitution of a new light-touch keyboard with piano- 
like keys for the typewriter-like keys used in the 
earlier work. The MSD has been adequately de- 
scribed in,earlier reports so that only a brief sum- 
mary of its characteristics will be given here. 

The stimulus display panel consists of a row of 
eight red stimulus lights lying directly above a row 
of eight green response information lights. Illumina- 
tion of a subset of three red lights constituted a 
stimulus pattern. The green lights are activated 
whenever the corresponding kéy on the response key- 
board is pressed. A stimulus programming unit ~ 
based on two Western Union tape transmitters pro- 
duces the successive stimulus patterns from punched 
paper tapes. Under self-pacing (SP) the tapes are 
advanced .01 sec. after the operator has matched the 
current pattern. With automatic pacing (AP) the 
new pattern comes on after a preset interval. A 20- 
pen Esterline-Angus Graphic Operations Recorder © 


o 


138 


gives a continuous record of stimuli and“ responses. 

The operator sat in a chair 20 in. high, with a 
safety belt used to restrict his movements. Stimulus 
panel and response keyboard were mounted on small 
movable tables 30% in. high; with each arrange- 
ment the center of the response keyboard was 15 in. 
away from the center of the chair, and the stimulus 
panel was 25 in. distant from the center of the chair. 
The stimulus lights in the display were 40 in. above 
the floor. 

Design and procedure. The experimental design 
was a 9X9 latin square with two replications. The 
nine treatments were the nine possible display-con- 
trol positionings. These are denoted by pairs of the 
letters, L, R, F (indicating the left, right, and front 
positions relative to the operator) with the first 
lettér of a pair giving the position of the display 
panel, the second giving the position of the response 
keyboard, Within each treatment, operators matched 
wo blocks of 25 patterns, one given under SP, the 
ther under AP. The sequence of blocks was either 
AP, SP; SP, AP; AP, SP;..., or SP, AP; AP, 
SP; SP, AP;.... When any operator received 
one of these sequences, his replicate received the 
other, In addition, an initial warm-up block was 
given under SP using the FF treatment. 

Operators were strapped in the chair and given 
tape-recorded instructions. At the beginning of each 
block of patterns, the operator sat with hands on 
knees observing a fixation point 6 ft. high on a wall 
6 ft. distant. In response to a cue signaling the first 


Norman H. Anderson, David A. 


Grant, and Charles O. Nystrom 


pattern, he proceeded to match this pattern. Under 
SP, he continued in action for the remainder of the 
block of patterns. Under AP, he resumed his initial 
posture until the cue signaled the next pattern. Two 
signal cues were available: watching the display ob- 
liquely, or listening for the distinct click emitted by 
the apparatus in advancing the new pattern. Op- 
erators generally preferred the latter cue. Time be- 
tween successive patterns for AP treatments was 
about 6 sec., which was practically always ample to 
allow both matching and resumption of initial pos- 
ture. This interval was occasionally but unsys- 
tematically changed by as much as 2 sec. in order to 
ayoid conditioning to the time interval. The 1-min. 
interblock rest allowed sufficient time to change dis- 
play-control positioning when a new treatment was 
to be used. 

Subjects. The Ss or operators were 18 male stu- 
dents at the University of Wisconsin who had vol- 
unteered to serve as paid Ss at the Laboratory of 
Experimental Psychology. In order to reduce prac- 
tice effects, operators were selected unsystematically 
from those who had already served in an earlier ex- 
periment using the same apparatus in a nearly identi- 
cal task. Two additional conditions were imposed: 
(a) all operators were right-handed, and (b) all op- 
erators had scored under 50 sec. on each of the last 
four trials of matching a self-paced block of 25 3- 
light patterns in the earlier experiment. In all cases 
there was a 7-day interval between the operator's 
performance in the two experiments. 


Table 1 


Mean Scores per Pattern for Response Times, Latencies, and Errors: 
Each Mean the Average of 15 Responses from 18 Ss 


Location of Response Keyboard 


Left Front Right 
s Left 1.84 1.93 2.19 Response time 
Front 1,97 1.68 1.96 automatic pacing 
Right 2.30 1.94 1.90 (.058) 
Left 1.10 1.12 1.23 Latency 
Front 1.22 0.98 1.16 automatic pacing 
Right 1.31 1.15 1.16 (.050) 
Location of Left 1.44 1.41 1,60 Response time 
Stimulus Front 1.47 1.38 1.45 self-paced 
Panel Right 1.52 1.42 1.48 ' (.036) 
Left 1.81 2.03 2.57 Errors per pattern 
Front 1.79 1.91 1.93 automatic pacing 
Right 2.31 1.83 1.93 (1.145) 
Left 1.17 1.04 1.36 Errors per pattern 
Front 1.19 1.08 1.26 self-paced 
Right 1.50 1.20 1.47 (0.642) 


'| 


# 
Influence of Spatial Positioning of Stimulus and Response Components 


Results 


Only the last 15 stimulus patterns of each 
block of 25 were scored, the first 10 being 
used as warm-up. For both the SP and AP 
blocks, the response time (defined as time 
from onset of stimulus to the correct re- 
sponse) to the nearest .1 sec. and number of 
errors per 15-trial block were recorded. In 
addition, the latencies (from onset of stimu- 
lus to initial response) were measured for AP 
blocks. The 15 patterns were scored as a 
whole for the SP blocks. For AP blocks, the 
responses were read individually although the 
scores for the 15 patterns were used in the 
analysis. 

The numerical scores are summarized in 
Table 1. The five scores reported in Table 1 
consist of average response time, latency, and 
errors per pattern under AP, and average re- 
sponse time and errors per pattern under SP. 
The error variances for each score from the 
9 X 9 latin squares are given in parentheses 
in the right-hand column. 

Since angle between stimulus display and 
control keyboard was considered the most 
important single physical variable, the time 
scores have been plotted in Fig. 1 as a func- 
tion of the angle between display and con- 
trol units. The angle has nominal values of 
0°, 90° and 180°. Each display-control con- 
dition is designated by the letter pair beside 
the data points in Fig. 1. The error data 
presented in Table 1 exhibit the same trend as 
the time measures but with less uniformity. 

All scores were analyzed for treatment ef- 
fects, practice effects (columns) and indi- 
vidual differences (rows) for the pair of latin 
squares. Treatment effects were then further 
analyzed to evaluate effects of display place- 
ment, control placement, and the’ display- 
control placement interaction.’ Finally the 
treatment means were ordered and subjected 
to the first step of the Tukey gap test (10), 
using the .01 confidence level, in order to 
determine which specific display-control ar- 

3 Analysis of variance summary tables for these 
tests have been deposited with the American Docu- 
mentation Institute. Order Document No. 4725 from 
ADI Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $1.25 for pho- 
tocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 


139 


© LATENCY 
O RESPONSE TIME 


F-FRONT, L- LEFT, R- RIGHT POSITION 
FL- STIMULUS PANEL IN FRONT POSITION, 


2.4 RESPONSE KEYBOARD IN LEFT POSITION 


N 
N 


pad 
o 


MEAN SECONDS PER PATTERN 


LOF off 
AUTOMATICALLY- PACED SELF - PACED 
o° 90° 180° 0° 90° 180° 


ANGULAR SEPARATION 


Fic. 1. Variation of mean time per response pat- 
tern with angular separation of stimulus panel and 
response keyboard: response time and latency for 
automatic pacing; response time for self-pacing. 


rangements were superior to others. Because 
the error scores showed significant variation 
only between operators, their further analysis 
will not be discussed, 


AP Procedure 


Response time. The response time in- 
creases in Fig. 1 with increasing angle be- 
tween display panel and control keyboard. 
Within a given angle, however, the various 
procedures are reasonably homogeneous; for 
example, the RF, LF, FR, and FL averages 
at 90° lie within a one-tenth sec. interval. 
Effects of display panel positions, response 
keyboard position and their interaction are 
all significant (p< .001). Significant indi- 
vidual differences (p < .001) and learning ef- 
fects (p < .001) were also obtained. The 
Tukey gap test gave the following separation 
of means: 


FF < LL < RR < (LF, RF) > 
< (FR,FL) < LR < RL. 


140 


Here, as below, a < sign indicates that the 
treatment on the left had a significantly lower 
mean than the treatment on the right. Pa- 
rentheses enclose treatments that are not sta- 
tistically separable from one another. The 
gap test shows that within a single angular 
separation of display and control elements it 
is more important to center the control key- 
board than to have the stimulus display in 
the optimal position. 

Latency. Although the same trends appear 
in the time to initial response or latency data 
as in the time to correct response data, the 
magnitudes of the differences are smaller. 
Aside from individual differences (p < .001), 
only keyboard position affected the latency 
score significantly (p < .05). The gap test 
gave the following separation of treatments: 


FF < LL < LF < (RF, RR, FR) 
< (FL,LR) < RL. 
SP Procedure 


The response time measures under the SP 
procedure show essentially the same features 
as response times under AP procedures. The 
range of variation is, however, considerably 
smaller. Aside from significant individual 
differences (p < .001) and the practice ef- 
fect (p < .05), only the position of the re- 
sponse keyboard had a significant effect (p 
<.05). The gap test gave the following 
separation of treatments: 


FF < (LF, RF) < (LL, FR) 
< (FL, RR) < RL < LR. 


Here some of the centered arrangements were 
not superior to the 90° separations in con- 
trast to the corresponding AP results. 


Discussion 


The results of the experiment show that 
the angular separation of the display and 
control units and the absolute position of the 
display and control units significantly affect 
the time scores under both the AP and the 
SP pattern-matching procedures. The abso- 
lute position of the response keyboard was 
more important than the absolute position of 
the stimulus display. The AP scores were 
about three times as sensitive as the SP scores 


Norman H. Anderson, David A. Grant, and Charles O. Nystrom 


to the angular separation of display and con- 
trol units. The percentage loss from best to 
worst spatial separation for SP was 10% to 
15% and for AP 30% to 40%. The greater 
difference in AP performance was presumably 
caused by the fact that the operator had first 
to respond to the cue and then to the stimu- 
lus pattern proper. In doing so he had to 
execute certain gross bodily movements. With 
SP these postural readjustments were not re- 
quired, and the cue and stimulus were identi- 
cal. The greater sensitivity of the AP pro- 
cedure to time degradation is of special sig- 
nificance because it approximates more closely 
the operating conditions in many practical 
monitoring situations. 

The AP response times of Fig. 1 show a 
degradation of .6 sec. in going from most to 
least favorable treatment. Comparing these 
data with the latencies, it, is seen that half 
this loss is due to increased latency, with the 
other half arising in the manipulative process 
itself. It should be noticed that this was not 
accompanied by a significant change in num- 
ber of errors. 

When the above results are compared with 
the results of earlier studies in this series (6, 
7) and those of Fitts and Seeger (5), it is 
seen that a less preferred or suboptimal spa- 
tial positioning of display and control key- 
board produces a much smaller degradation in 
reaction time and response accuracy than re- 
sults from interference with the natural cor- 
respondence between the stimulus and re- 
sponse components. Increases of response 
time from 10% to 40% were found in this 
experiment. Linear transposition of stimulus 
and response elements and disturbance of 
the angular correspondence between stimulus 
lights and response keys were found to give 
time increases as great as 1500% and 100%, 
respectively. Hence, if departures from opti- 
mal display-control relations are required, it 
is probably best to move the display unit or 
the keyboard laterally as units rather than to 
rearrange the individual keys or response ele- 
ments. 

Summary 

Results are reported of an experiment in- 
vestigating operator efficiency in a key-press- 
ing task as a function of spatial positioning 


Influence of Spatial Positioning of Stimulus and Response Components 


of the stimulus panel and response keyboard. 
The stimulus panel and the response key- 
board occupied positions that were to the left, 
right, or in front of the operator. The nine 
possible combinations of positions of stimu- 
lus display and response keyboard were used 
as treatments, using a balanced experimental 
design on 18 Ss. Two modes of stimulus 
presentation were employed within each treat- 
ment: under self-pacing, S kept his fingers on 
the response keyboard, matching the stimulus 
patterns which succeeded one another as fast 
as they were matched; under automatic pac- 
ing, S returned to a rest position between 
matching successive patterns which were pre- 
sented approximately 6 sec. apart. 

Five sets of scores were taken. Response 
time and number of key presses (an error 
index) were measured in both automatic pac- 
ing and self-pacing. In addition, latencies 
were measured in the automatic-paced pro- 
cedure, 

The following results were obtained: 

1. With the self-paced procedure, response 
times were 10% to 15% greater when the 
stimulus and response units were on opposite 
sides of the S than for the optimal arrange- 
ment where both units were in front of S. 
The corresponding increase for automatic 
pacing was 30% to 40%. 

2. For automatic pacing, half of the de- 
crease in efficiency arose in the manipulatory 
process at the keyboard. The other half was 
associated with the additional movements 
necessary in the less efficient treatments. 

3. No significant differences in errors were 
observed among the various treatments. 

4. Position of the response keyboard ex- 
erted a significant effect on all three time 
measures, the centered position being pre- 
ferred, and the left position giving poorest 
results. For automatic pacing, the position 
of the stimulus panel and its interaction with 
the response keyboard were also significant 
factors, the front position being best and the 
right position poorest. 

5. For each time measure, the different 


141 


treatments showed considerable separation 
when tested with the Tukey gap test. Gen- 
erally speaking, placement of response key- 
board was more important than location of 
the display. 

The present results are contrasted with in- 
crease in response time as great as 1500% 
obtained in the previous experiments of this 
series where the effects of interfering with 
natural angular and linear correspondences of 
individual stimulus and response elements 
were investigated. 


Received June 14, 1955. 


References 


- Barnes, R. M. Motion and time study. 
Ed.) New York: Wiley, 1949. 

2. Chapanis, A., Garner, W. R., & Morgan, C. T. 
Applied experimental psychology. New York: 
Wiley, 1949. 

3. Fitts, P. M. A study of location discrimination 
ability. In P. M. Fitts (Ed.), Psychological 
research on equipment design. Washington: 
U. S. Government Printing Office, 1947. Pp. 
207-217. (AAF Aviat. Psychol. Program Res. 
Rep. No. 19.) 

4. Fitts, P. M. Engineering psychology and equip- 
ment design. In S. S. Stevens (Ed.), Hand- 
book of experimental psychology. New York: 
Wiley, 1951. Pp. 1287-1340, 

. Fitts, P. M., & Seeger, C. M. S-R compati- 
bility: spatial characteristics of stimulus and 
response codes. J. exp. Psychol, 1953, 46, 
199-210. 

6. Morin, R. E., & Grant, D. A. Learning and per- 
formance on a key-pressing task as a function 
of the degree of stimulus-response correspond- 
ence, J. exp. Psychol, 1955, 49, 39-47. 

7. Nystrom, C. O., & Grant, D. A. Performance 
on a key-pressing task as a function of the 
angular correspondence between stimulus and 
response elements, Percept. Mot, Skills, 1955, 
1, in press. 

8. Scales, Edyth M., & Chapanis, A. The effect on 
performance of tilting the toll-operator’s key- 
set. J. appl. Psychol., 1954, 38, 452-456. 

9. Stellar, E. Human factors in panel design. In 
Human factors in undersea warfare. Wash- 
ington: National Research Council, 1949. Pp. 
153-176, 

10. Tukey, J. W. Comparing individual means in 

the analysis of variance. Biometrics, 1949, 5, 

99-114. 


(3rd 


a 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


An Investigation of the Shape of Learning Curves for 
Industrial Motor Tasks * 


Jean Grove Taylor 
Johns Hopkins University Operations Research Office 


and Patricia Cain Smith 
Cornell University 


Learning curves are used in industry for a 
variety of purposes, including monetary incen- 
tives and evaluation of the learners’ progress. 
The standard or model curve is generally one 
which has been either drawn “intuitively” or 
copied from published curves, the validity of 
which has been assumed rather than empiri- 
cally established for the tasks involved. The 
purpose of this study is to determine whether 
there is a “typical” learning curve for tasks 
differing greatly in degree of complexity and 
learning time, but performed under similar in- 
centive conditions. 

Short periods of learning and temporary 
conditions of motivation have been charac- 
teristic of laboratory studies (2, 6, 10, 12, 
15). A satisfactorily representative curve 
should be based on a period of learning suffi- 
ciently long to ensure that the learning curve 
has reached a plateau. Motivation to learn 
should not be reduced at any point by restric- 
tive codes or “loose” standards which might 
introduce motivational plateaus, and group 
rather than individual data should be used 
for the sake of reliability. Field investiga- 
tions have been reported for motor tasks such 
as typewriting (4), telegraphy (5), hosiery 
looping (16), and textile-machine operations 
(3). Although a few of these studies are 
based on group data, none has taken into ac- 
count individual differences in total time to 
reach some criterion of learning. A typical 
error in analysis has been to average the pro- 
duction figures attained at the end of some 
period, e.g., the fifth week, for a group of 


1 This analysis was carried out as part of a mas- 
ter’s thesis by Jean Taylor, and was performed at 
Cornell under the direction of Patricia Smith. The 
authors wish to express their gratitude to Kurt 
Salmon Associates, and to their client who prefers 
to remain anonymous, for. their cooperation in ob- 
taining the records upon which the study was based, 


workers even though the time required to 
learn to a criterion varied for individual work- 
ers from ten to twenty weeks. This pro- 
cedure distorts the shape of the resultant 
curve since the rates of learning are very dif- 
ferent for the different learners at different 
percentages of total learning time. 

In this investigation, learning curves were 
compared for relatively simple tasks involving 
a fixed motor sequence with those for tasks 
involving increasing degrees of complexity 
and requiring continuous and varied adjust- 
ments. 

The Material 


The material for this investigation was ob- 
tained from a non-unionized factory in the 
South engaged chiefly in the production of 
boys’ and men’s dungarees and overalls. All 
of the standards had been set during 1940 and 
1941 by a firm of consulting engineers and re- 
mained constant throughout the period of 
this study. The piece-rate system was sup- 
plemented by a guaranteed minimum wage, 
and in most cases by a learners’ bonus plan.” 
Payment during the learning period proceeded 
stepwise from a wage in the first week equal 
to about 60 per cent of the final base rate to 
a wage of 100 per cent of the base rate in the 
last week (at 100 per cent productivity, or 
standard). The learner was furnished with 
a learners’ curve as a guide to the percentage 
of standard that should be produced on the 
specific job each week in order to obtain 
learners’ bonus wages. These curves were 
drawn by the consulting engineers without 


2 Thirteen of the 70 curves used in the study were 
obtained from operators trained after the 75-cent 
minimum wage law went into effect, which change 
was followed by temporary abandoning of the learn- 
ers’ plan. Inspection of these curves showed no sys- 
tematic differences from those of workers trained 
previous to this change. 


142 


Shape of Learning Curves for Industrial Motor Tasks 


143 


Table 1 


Number of Curves, Range of Weeks and Range of Percentage Productivity, Median Number of Weeks and 
Median Percentage Productivity at the First Week of Initial Plateau 


Figures at First Week of Initial Plateau 


Range of 
Weeks 
No. to First Median Range % Median 
Job Curves Plateau Weeks Production Production 

Main Study 

Tacking (line shaft) 6 11-26 18 91-123 110 

Face front pockets 11 8-27 17 75-120 86 

End finish (or finish band ends) 5 7-19 10 91-110 100 

Attach flys 7 9-22 20 87-120 98 

Fell in and out seams 9 10-26 16 78-120 93 

Attach back pocket 5 15-23 16 81- 97 90 
Check Study 

Hem suspenders 4 10-29 22 94-120 104 

Tacking (individual motor) 6 9-15 11 80-104 94 

Bander 4 11-19 16} 98-122 117 

Bottom hem 4 16-23 18 84-107 93.5 

Hem watch pocket and attach 

label 4 10-17 13 74- 83 80.5 
Serge front pocket 5 10-18 15 80- 95 85 


detailed analysis of the obtained curves in 
the plant. The length of the training period 
on several key jobs was determined by the 
average time to reach standard. The shape 
of the curve was, with but one exception, the 
same for all jobs. 

Special “personnel supervisors” who knew 
the operations thoroughly had the responsi- 
bility for on-the-job training of new workers. 
One group of jobs was specifically assigned to 
each supervisor. The trainers were about 
equal in experience, had been given approxi- 
mately equal and similar instruction in train- 
ing methods, and were closely supervised by 
a general supervisor. Motivational and learn- 
ing conditions were thus fairly comparable 
from task to task. 

Learning curves were obtained for 189 op- 
erators trained between the years of 1945 and 
1949. Curves were eliminated for operators 
who had left the plant before the end of the 
official learning period set for the job or soon 
thereafter, and for jobs which had fewer than 
four trainees, leaving 70 usable curves. 


Procedure and Results 


Modified Vincent curves were constructed 
for each of six jobs, which varied widely in 
difficulty and in the extent to which the tasks 
required continuous adjustment by the worker, 
and for the composite of the six jobs. The 
jobs chosen are shown in Table 1. Each of 
the six job classifications for machine opera- 
tions is included, representing six levels of 
difficulty. The results for these jobs were 
checked for six remaining jobs, similarly dif- 
fering in difficulty. The procedure for this 
analysis was as follows: 

Criterion of learning. The determination 
of the point at which learning terminates poses 
several problems. As Fig. 1 illustrates, the 
curves may continue to rise for a long time. 
There are, in fact, many cases in which in- 
creases continue to take place for as long as 
two years. In those cases where data are 
available, increases are apparent for three or 
four years. Verbal reports of operators on 


8 Space limits the presentation of figures to only 
three of the original thirteen. The original thesis is 
filed in the Cornell University Library. 


144 Jean Grove Taylor and Patricia Cain Smith 
i40 K 
HEM WATCH POCKET 
AND ATTACH LABEL 
120 
100 
a - END FINISH 
E 
> 80 
= 
O 
eal 
f=} 
z 
= 60 
xs 


40 


o 20 40 


60 80 100 


WEEKS OF PRODUCTION 
Fic. 1. Individual learning curves for two jobs, showing period of initial plateau. 


similar sewing jobs indicate that operators 
believe that they continue to learn through- 
out this period, both in terms of minor im- 
provements of method and improved “feel” 
for the task. After the initial plateau which 
appears after the first sharp rise of the learn- 
ing curve, further plateaus may occur re- 
peatedly throughout an individual curve, and 
quite characteristically do so, although they 
may appear for some individuals on a par- 
ticular job and not for others. For the sake 
of consistency, therefore, we chose the be- 
ginning of the first plateau, the period of 
initial leveling, as our criterion of 100 per 
cent learning time. The first of each pair of 
X’s in Fig. 1 represents that point. 

Each of two judges independently chose this 
criterion point for 43 learning curves, indicat- 
ing the first week followed by at least seven 
weeks in which production did not increase 
appreciably. No arbitrary limit was set on 
the amount of increase; instead, the total 
picture of the curve was used. The results 
showed, however, that the maximum increase 


over the initial point was 3 per cent for the 
level period. The second X of each pair in 
Fig. 1 represents the termination of this pe- 
riod. The two judges agreed within one week 
concerning the first week of the plateau in 75 
per cent of the cases, with an average of only 
two weeks’ difference in all cases. After dis- 
cussion, the period of the initial plateau for 
each of the learning curves was finally deter- 
mined by joint agreement of the two judges. 

Plotting of individual curves. The rate of 
learning and the percentage productivity at 
the end of initial learning vary from indi- 
vidual to individual. The curves were first 
converted to a common scale on one axis on 
the basis of learning time. For each indi- 
vidual on a given job, percentage produc- 
tivity (in terms of the standard production 
set by engineers) was plotted against the per- 
centage of that individual’s total learning time 
which had been completed at a given week. 
For example, if in Week 7 the individual had 
a production record of 70 per cent of stand- 
ard production and his total learning time 


Shape of Learning Curves for Industrial Motor Tasks 


100 


80 


60 


40 


% OF ATTAINED PROFICIENCY 


40 
% 


Fic. 2. 


was 21 weeks, the value was plotted at 70 
per cent on the vertical axis and 33% per cent 
on the horizontal axis. These charts are 
plotted in terms of percentage of engineering 
standard and percentage of learning time (to 
first plateau). 

The curves for any one job show no sys- 
tematic differences in shape. As shown in 
Table 1, number of weeks required to reach 
the first week of the initial plateau ranged 
from 7 to 27 weeks for individual learners on 
the six jobs. Likewise, the level of produc- 
tion reached at the first week of the initial 
plateau ranged from 75 per cent to 123 per 
cent productivity. Although there was great 


145 


COMPOSITE CURVES EQUATED 
FOR DIFFERENTIAL % 
PRODUCTIVITY 


Tacking 
Finish Band Ends 


—-— Attach Flys 
=- Fell In and Out Seams 


Attach Back Pocket 
Face Front Pocket 


100 


80 


60 


LEARNING TIME 


Composite prorated curves of main study. 


deviation with respect to these two variables, 
it was evident from inspection of the curves 
that the shapes of the learning curves for in- 
dividuals on the same job were relatively the 
same, thus justifying the use of a median 
curve to represent “typical” progress for that 
task. 

Construction of composite curves for each 
job. The composite curve for each job was 
constructed from the individual curves by 
computing, at each of the 10 per cent divi- 
sions of learning time, the median of the 
percentage productivity figures from the indi- 
vidual curves. These median production fig- 
ures during the period of the initial plateau 


146 Jean Grove Taylor and Patricia Cain Smith 


Table 2 


Ranges of the Composite Curves at Termination of 
Each Tenth of Learning Time for the 


Main and Check Studies 
Ranges of % of Proficiency 
% of at Criterion 
Learning 
Time Main Study Check Study 
10 33-51 38-44 
20* 52-64 53-65 
30 62-72 62-72 
40* 67-80 66-75 
50* 70-85 69-79 
60 74-86 75-84 
70* 79-89 77-87 
80* 82-92 80-90 
90 88-95 90-95 


* Percentage of learning time where range of check study is 
not within range of main study. 


were found to vary from 86 per cent to 110 
per cent on the different jobs (see Table 1). 
To make these composite curves comparable, 
they were again prorated, this time on the 
other axis, in units representing percentages 
of the median production during the initial 
plateau, as shown in Fig. 2. The percentage 
productivity of each curve at 100 per cent 
learning time (beginning of first plateau) was 
designated as 100 per cent proficiency for 
that job, and each curve was redrawn in 
terms of percentages of that figure. 

The most striking feature of these com- 
posite job curves is their similarity. For all 
jobs, there is a noticeably high percentage of 
proficiency attained at the end of 20 per cent 
of learning time, with fairly regular increases 
thereafter. From 90 per cent to 100 per cent 
of learning time there is, in all cases, a sharp 
increase in percentage of attained proficiency. 
This is due in part to the fact that the median 
production figure at this 100 per cent point 
was not based on the prorated part of the 
curves, which usually involved some smooth- 
ing, since adjacent points were connected. 
The end of the curve was determined instead 
by actual production figures of the first week 
of the initial plateau which had been plotted 
for the individual curves. The terminal rise 
may also be due to the fact that the criterion 
of learning was defined as the first week of 
the period of initial leveling, a high point of 


the curve which was followed by no appreci- 
able increase in production. In many cases 
this was a point where considerable increase 
in production over the preceding point had 
been recorded. Although choice of this point 
distorts the shape of the curve at the end, the 
shape of the rest of the curve should not have 
been distorted by this method. 

The small differences between curves were 
compared to determine whether they were re- 
lated to task characteristics such as com- 
plexity and amount of adjustment required 
of the operator to perform a motor pattern. 
Tacking, which involves feeding a semi-auto- 
matic machine, represents a comparatively 
simple task involving a fixed motor sequence. 
A more complex motor pattern and a much 
greater amount of adjustment are required to 
perform Felling In and Out Seams. While 
the operator is felling the seams she must 
constantly feed, hold back, and readjust the 
material in her lap and arms. The remain- 
ing jobs range from the simpler to the more 
complex in both motor pattern and degree of 
adjustment required. The median job curves 
presented in Fig. 2 overlap each other and 
follow one general shape, instead of maintain- 
ing a fixed position relative to each other. In 
view of this, it would seem that the curves 
could not be differentiated on the basis of 
complexity of job, or amount of adjustment 
required. 

Check study. An additional study was un- 
dertaken to determine whether these results 
would stand up under the analysis of six more 
jobs. Twenty-seven curves on six additional 
jobs, comparable in all respects to those of 
the main study, were analyzed in the same 
manner. (See the last half of Table 1.) 
Ranges of the percentages of the median 
production during the initial plateau which 
were attained for each of the composite curves 
were computed at the point terminating each 
tenth of the learning period. These ranges 
for this check study differed from the ranges 
of the main study by 2 per cent or less at any 
10 per cent point. (See Table 2.) The gen- 
eral shapes of learning curves in the check 
study were also in agreement with those of 
the main study.* 


4 See footnote 3. 


| 


— 


Shape of Learning Curves for Industrial Motor Tasks 


MEDIAN % OF ATTAINED PROFICIENCY 


(0) 20 40 


% 


Fic. 3. 


Construction of composite curves for all 
jobs. The “typical” curve for the combina- 
tion of all six of the jobs in the main study 
was derived by taking the median percentage 
of attained proficiency for each 10 per cent 
of learning time of the six composite curves. 
This composite is shown in the solid line of 
Fig. 3. Similarly, the composite was con- 
structed for the check study, and appears as 
the broken line in the figure. Inspection of 
the two curves demonstrates their essential 
similarity. Both curves are negatively ac- 
celerated, but with a large portion of the curve 
showing a nearly linear rise, extending from 


147 


MEDIAN LEARNING CURVES 


Main Study 
Check Study 


60 80 100 


LEARNING TIME 


Composite curves for all jobs, main and check studies. 


20 to 80 per cent of the total learning time. 
These curves were compared with the curves 
used as guides by the learners; they are 
clearly different in shape. 


Discussion 


There is no evidence in the present study 
to suggest that differences in the complexity 
of the task, or in the degree of adjustment 
required, influence the general shape of the 
learning curve within the group of tasks 
studied here. There are, however, two char- 
acteristics of the obtained curves which re- 
quire explanation: the nearly rectilinear rise 


148 Jean Grove Taylor and Patricia Cain Smith 


in the period immediately following the sharp 
initial increase, and the continuation of the 
increases for long periods of time after the 
first plateau. The rectilinear portion of the 
curve does not appear in the average curves 
obtained in previous field studies, where a 
different principle of averaging was used, al- 
though inspection of individual data sug- 
gests that it is there in some of the more 
complex tasks previously investigated (11). 
Laboratory data do not usually include this 
portion of the curve except for quite simple 
tasks, 

We propose that both the rectilinear phase 
and the continued rise may be explained in 
terms of shifts in the abilities required to 
learn and to perform the parts of a complex 
task as it is learned. Correlations between 
performance before and after learning may 
be quite low (3, 9, 11, 14). Shifts in the 
factorial composition of complex tasks have 
been shown by Fleishman and Hempel, with 
decreases in loadings of verbal comprehension 
(8), perceptual speed (7), mechanical experi- 
ence (7), and spatial relations (7, 8), and 
increases in loadings for reaction time (8), 
motor speed (7, 8), and factors specific to 
the task (7, 8), as learning progressed. Un- 
published work by one of the authors shows 
a sharp decrease in validity of visual tests for 
prediction of criteria after the learning pe- 
riod, rather than during learning, for power 
sewing-machine operators. 

On the basis of such evidence, combined 
with observation of and discussion with work- 
ers at various stages of learning on such jobs, 
we suggest that the obtained learning curves 
include three sections, in each of which dif- 
ferent abilities are of primary importance, 
and different processes are taking place. 

1. The initial sharp rise which represents 
the first 20 per cent of the learning time is 
one in which the relationships of the parts of 
the garment and the principal components of 
the task are learned. Errors of procedure can 
be observed visually by both trainer and 
learner, so that they can be corrected. Here 
spatial, verbal, and perceptual factors may 
well be important, while kinesthetic and mo- 
tor speed abilities have relatively little effect 
on performance. 


2. After this period of rapid familiarization 
with the nature of the task, the curve becomes 
nearly rectilinear. Here increments of per- 
formance are much smaller, and almost equal 
to one another. Workers report that they are 
getting the “feel” of the task; this kinesthetic 
experience cannot be communicated verbally, 
checked visually, or induced except as the 
learner introduces variations in movements 
which are followed by discriminable improve- 
ment. This process must necessarily be slow. 
We believe that an ability (or a set) to at- 
tend to kinesthetic cues and to relate them to 
errors in performance is important in deter- 
mining individual differences in achievement 
at this time, and that this is the specific vari- 
ance found by Fleishman and Hempel (8), 
rather than an integration ability, as they 
suggest. Visual factors become less impor- 
tant as kinesthetic factors become dominant. 

Motor speed is also important, and begins 
to limit performance at this time. Several 
studies (e.g., 1, 13) have shown that the times 
of long “travel” movements show less im- 
provement with succeeding trials than do ma- 
nipulative movements. These differences also 
appear in Fleishman and Hempel’s data (8, 
p. 309). Possibly because of a higher initial 
level of practice for these grosser movements, 
they reach their maximum speeds early in the 
learning period. Much of the variance in 
total speed of performance during this period 
is probably due to the speed of these move- 
ments. It is during this section of the learn- 
ing period that performance begins to cor- 
relate fairly highly with performance after 
learning (14). 

3. In this section the curve reaches a 
plateau. Further increases are probably due 
to changes in motivation and attention, as 
well as to improvements in method of ma- 
nipulation. Motor speed, we suggest, is a ma- 
jor determiner of individual differences in 
level of performance, with sensitivity to kin- 
esthetic feedback contributing a minor por- 
tion of the variance. 

These suggestions should be checked by 
comparison of the changes in shapes of curves 
with changes in the interrelationships of tests 
and performances throughout very long learn- 
ing periods such as those obtained in the pres- 


oat cemeteries atin, 


y 5 


Shape of Learning Curves for Industrial Motor Tasks 


ent investigation, and by further analysis of 
the shapes of curves obtained in laboratory 
learning situations using more complex tasks 
and extending learning periods further than 
in previous studies. Although the same curve 
applied to all of the tasks in the present 
study, one cannot assume that this obtained 
curve can be safely generalized to other learn- 
ing situations, 


Summary and Conclusions 


Seventy learning curves from operators on 
twelve power sewing-machine operations were 
analyzed. Using the period of initial plateau 
as a criterion of learning, modified Vincent 
curves were established for each job, and 
separate composite curves for each of two 
groups including half the jobs. 


1, Increases in productivity continued over 
long periods beyond the initial plateau for 
individual workers. 

2. Differences in complexity and adjustive 
requirements of these tasks were not system- 
atically related to differences in shape or slope 
of the composite learning curves for each job. 

3. The composite curve based on the first 
six tasks analyzed matched very closely the 
composite curve of the remaining six tasks. 

4. One negatively accelerated curve could 
serve as the “typical” curve for all of these 
tasks. This curve showed a sharp initial rise, 
followed by a period of more gradual, nearly 
linear, increase. 


Suggestions concerning the change in the 
requirements of the task during learning were 
proposed to account for the shape of the 
curves, and the length of the period of im- 
provement in the individual curves. 


Received July 11, 1955. 


149 


References 


1. Barnes, R. M. Motion and time study (3rd 
Ed.). New York: Wiley, 1949. Pp, 498-500, 

2. Batson, W. H. Acquisition of skill, Psychol. 
Monogr., 1916, 21, No. 3 (Whole No, 91), 

3. Blankenship, A. B., & Taylor, H, R. Prediction 
of vocational proficiency in three machine op- 
erations. J. appl. Psychol., 1938, 22, 518-526, 

4. Book, W. F. The psychology of skill: with spe- 
cial reference to its acquisition in typewriting. 
Univer. Montana Publ. Psychol, Bull. No. 
53, 1-158. 

5. Bryan, W. L., & Harter, N. Studies in the psy- 
chology and physiology of the telegraphic lan- 
guage. Psychol. Rev., 1897, 4, 27-53. 

6. Elwell, J. L, & Grindley, G. C. The effect of 
knowledge of results on learning and perform- 
ance. Brit. J. Psychol., 1938, 29, 39-53, 

7, Fleishman, E. A., & Hempel, W. E., Jr. Changes 
in factor structure of a complex psychomotor 
test as a function of practice. Psychometrika, 
1954, 19, 239-252. 

8. Fleishman, E. A. & Hempel, W. E., Jr. The 
relation between abilities and improvement 
with practice in a visual discrimination re- 
action test. J. exp. Psychol, 1955, 49, 301- 
310, 

9. Kornhauser, A. W. A statistical study of a group 
of specialized office workers. J. pers. Res, 
1923, 2, 103-123. 

10. Krueger, W. C. F. Influence of difficulty of 
perceptual-motor task upon acceleration of 
curves of learning. J. educ. Psychol, 1947, 
38, 51-53, 

11. McGehee, W. Cutting training waste. 
nel Psychol., 1948, 1, 331-340, 

12. Peterson, J. Experiments in ball-tossing: the 
significance of learning curves. J. exp. Psy- 
chol., 1917, 2, 178-224, 

13. Ruben, G., Trebra, P. V., & Smith, K. U. Di- 
mensional analysis of motion: III. Complexity 
of movement pattern. J. appl. Psychol., 1952, 
36, 274, 

14. Smith, P. C., & Gold, R. A. Prediction of suc- 
cess from examination of performance during 
the training period. J. appl. Psychol, 1956, 
40, 83-86. 

15. Swift, E. J. Studies in the psychology and 
physiology of learning. Amer. J. Psychol, 
1903, 14, 201-251. 

16. Tiffin, J. Industrial psychology. 
Prentice-Hall, 1947. 


Person-. 


New York: 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


The Effect of Lack of Information on the astay 
Response in Attitude Surveys 


Marvin D. Dunnette, Walter H. Uphoff, and Merriam Aylward ©? 
Industrial Relations Center, University of Minnesota 


The Industrial Relations Center has de- 
veloped a Union Attitude Questionnaire to 
measure the attitudes of union. members to- 
ward unionism in general and toward various 
aspects of their local and national unions (1, 
2, 3). The questionnaire consists of 77 state- 
ments to which respondents indicate various 
degrees of agreement or disagreement. Scores 
are reported by one or both of two methods: 
(a) they are derived by the standard Likert 
scoring technique; and, (b) they may con- 
sist of percentages responding favorably, un- 
favorably, or undecided to each statement. 
The questionnaire may be scored for atti- 
tudes toward the following six areas: union- 
ism in general, local union in general, local 
union policies and practices, local union offi- 
cers, local union administration, and the na- 
tional union. 

During development of the questionnaire, it 
became apparent that a few items were draw- 
ing an unduly large proportion of undecided 
responses. Although the large majority of 
items received fewer than 20 per cent unde- 
cided responses, a few ranged as high as 45 
or 50 per cent. Such large proportions of un- 
decided responses rendered interpretation of 
attitude scores difficult. It was decided, 
therefore, to investigate the basis for the 
large undecided response given to certain of 
the items. This article describes the design 
of the study and the results obtained. 


Method 


During the developmental stages of the Union 
Attitudes Questionnaire, a pool of 121 items was ad- 
ministered to 821 persons belonging to nine different 


1Funds for this research were provided by the 
Graduate School, University of Minnesota. Aid and 
advice from the following persons are gratefully ac- 
knowledged: Dale Yoder, director, and Herbert G. 
Heneman, Jr., assistant director of the Industrial Re- 
lations Center, Donald G. Paterson, Robert Jones, 
Wayne Kirchner, and Lois Boggs. 

2The first and third authors are now with the 
Minnesota Mining & Manufacturing Co. 


union groups. Many items comprising the current 
questionnaire received a large proportion of unde- 
cided responses. These items are shown in Table 1. 

It is widely recognized that the response undecided 
may reflect one or more of the following situations: 


1. The respondent may actually be neutral with 
respect to the statement being responded to. 

2. The item may be ambiguous, the respondent 
choosing the undecided category because of inability 
to understand the question or to know “what the 
statement is getting at.” 

3. The respondent may be antagonistic toward the 
whole procedure of completing an attitude scale. 
One way of venting his antagonism and unwilling- 
ness to cooperate would be through a wholesale 
checking of the undecided responses. 

4. The respondent may feel the need to qualify 
his answers. In other words, he may see both sides 
of the question; thus, an undecided response may be 
an effort to “straddle the fence.” . 

5. Finally, a respondent may lack specific infor- 
mation or facts necessary to the formation of an 
attitude. He just doesn’t know enough about the 
statement to answer wisely; and, as a consequence, 
he gives an undecided response. 


Occurrence of each of these five possibilities is 
plausible in the use of attitude questionnaires. De- 
velopment and administration of the scale and analy- 
sis of responses, however, provide definite safeguards 
against occurrence of at least two of the above al- 
ternatives. For example, several techniques ranging 
from judgment by experts to item-analysis methods 
are commonly employed to identify ambiguous items. 
The final form of a scientifically developed attitude 
questionnaire will ordinarily include few, if any, am- 
biguous statements. Since development of the IRC 
Union Attitude Questionnaire included several meth- 
ods directed toward reducing item ambiguity, it is 
not likely that a significant proportion of undecided 
responses were due to this factor. 

It is equally unlikely that antagonism on the part 
of the respondents has played a significant role in 
the incidence of undecided responses. This conclu- 
sion is based on several lines of evidence. First, 
every effort was made during administration of the 
questionnaire to secure the understanding and co- 
operation of respondents. The purpose of the sur- 
vey was explained, and the confidential nature of 
the returns was emphasized, So far, there has been 
little evidence of open antagonism on the part of 
any survey respondent. When it appeared in suffi- 
cient degree to appear to invalidate the responses, 
the questionnaire was discarded. Secondly, com- 


150 


i : Lack of Information and Undecided Response 


151 
Table 1 
a Items Receiving Undecided Responses from 20 Per Cent or More of Persons Belonging to Nine Different Unions 
%. 
Undecided Item 
20 Our union president lets a few who like to talk take too much time at our meetings. 


21 Officers of my union are chosen because they are real leaders. 

22 Every labor union should be required to take out a license from the U. S. Government. 

23 Our national union provides the necessary facts and helps at negotiation time. 

24 My union does not keep careful enough records of all money taken in and spent. 

24 My union spends too much time and money on political action. 

24 Our union paper gives us only one side of an issue. 

25 If you read it in the union paper you know you are getting the facts. 

25 Our national union takes its share of our dues but gives us very little help. 

26 My union officers spend too much time on things that are of no concern to my union. 

27 Our national union interferes too much in our local affairs. 

28 There isn’t a better union than the one I belong to. 

30 My union looks after labor’s interests in the city council and in the state legislature. 

30 My union does not teach us enough labor history. 

31 Our national union exercises too much control over the affairs of our local. 

31 We give our delegates too much money to spend when they go to conventions. 

34 Our union officers know how to get the members to do things for the union. 

36 It is practically impossible to elect different officers in our national union. 

38 The officers of my national union are paid too much. 

40 There is not much “rhyme or reason” to the way our union votes to contribute to the various appeals 
for money that come to it. 

44 We don’t get enough help for our union educational program from the national union. 


pleted questionnaires were examined with a view to- 
ward identifying those with an unduly large number 
of undecided responses. Based on these examina- 
tions, it appears that the propensity to choose the 
undecided response differs little from individual to 
individual. It should be noted, finally, that any 
widespread effect leading to wholesale choosing of 
the undecided response would show up in the form 
of a general increase in the percentages of undecided 
responses for all items in the questionnaire. Actually, 
differences among items are far greater than differ- 
ences among individuals. Thus, as has been con- 
cluded above, the role of respondent antagonism in 
leading to undecided responses probably has been 
minor. 

It appears, then, that the major determiners of 
the undecided responses are lack of information 
and actual neutrality or “fence-straddling” attitudes. 
Examination of the content of items in Table 1 
suggests rather definitely that special knowledge may 
be required in order to form attitudes toward the 
areas considered. It appears that union members 
feel a definite lack of information concerning sev- 
eral aspects of their union. Note especially that 
seven of the eight items comprising the National 
Union subscale are among the statements in Table 1, 

In order to investigate the relative proportions of 
undecided responses stemming from actual neutrality 
and from lack of information, a sixth alternative 
was added to the items of the Union Attitude Ques- 


tionnaire. This alternative read: I DON’T KNOW 
ENOUGH ABOUT THIS TO ANSWER. 

It was reasoned that persons lacking sufficient 
knowledge to have formed an attitude would check 
this response and that neutral persons would con- 
tinue to check undecided. 

Samples of persons belonging to four different un- 
ions (autoworkers, office workers, retail clerks, and 
sheetmetal workers) were randomly separated into 
two groups. One group received the standard five- 
response questionnaire. The other group received 
the questionnaire with the sixth response added. 
Completed questionnaires were received from 214 
persons in the five-response group (Group I) and 
216 persons in the six-response group (Group II). 


Results 


The study was designed to answer the fol- 
lowing questions: 


1. Does the presence of the “I don’t know” 
response alter the distributions of favorable 
and unfavorable responses? In other words, 
does the “I don’t know” alternative draw re- 
sponses only from the undecided group, or 
does it draw additional responses from per- 
sons with definite attitudes? 


I Edni ?s¥ Research | 
ING COLLEGE 


152 


2. Is the proportion of undecided responses 
to items of the questionnaire reduced substan- 
tially by providing the sixth alternative? 

The first question was answered by com- 
paring the proportions of Group I and Group 
II who responded to the various alternatives 
on each of the items. In order to determine 
whether or not “I don’t know” responses were 
drawn entirely from the undecided group, the 
undecided and “I don’t know” responses were 
totaled for Group II and compared with the 
proportion of undecided responses obtained 
from Group I. A total of 385 (77 X 5) com- 
parisons was made. Differences were tested 
for significance by using Zubin’s tables (5). 
The distributions of differences and the num- 
ber of significant differences are shown in 
Table 2. 

Only 27 differences were significant at or 
beyond the 5% level. The number to be ex- 
pected by chance alone is about 19 (385 x 
.05); thus, only eight of the differences can 
be attributed to nonchance factors. This is 
striking evidence that persons who choose the 
“I don’t know” response are drawn almost 
entirely from the group who would otherwise 
choose undecided. It appears that the inclu- 
sion of a sixth-response alternative—“I don’t 
know enough about this to answer”—has no 
effect on the responses of persons who have 
formed favorable or unfavorable attitudes. 

Data presented in Table 3 bear on the 
second research question. It is clear that the 


Marvin D. Dunnette, Walter H. Uphoff, and Merriam Aylward 


proportion of undecided responses has been 
reduced substantially by inclusion of the 
sixth response. Evidently, a large segment of 
the undecided group is made up of persons 
who do not have sufficient information to 
form an opinion. 


Discussion 


It has been argued in this paper that am- 
biguity of items and antagonism on the part 
of questionnaire respondents probably play 
minor roles in the incidence of undecided re- 
sponses on a well-developed, well-adminis- 
tered questionnaire. Experimental evidence 
does suggest, however, that an important seg- 
ment of the undecided group consists of per- 
sons who don’t know enough about the state- 
ment to answer. 

An important question remains—one which 
was not investigated in this study. It bears 
on the interpretation to be given to the seg- 
ment of undecided responses coming from 
persons who feel they do know enough about 
the question to answer. Since other alterna- 
tives have been excluded, it is probable that 
such persons are truly neutral in the sense 
that they have pondered the pros and cons of 
a question and have arrived at a point some- 
where between favorableness and unfavorable- 
ness. 

Evidence in support of this contention 
comes from a study by Rosen and Rosen (4). 
For each item of an attitude questionnaire 


Table 2 


Distributions of Differences Between Percentages in Corresponding Response 
Categories of Groups I and II 


Differences Between 
Percentages Choosing 


Number of Items 


Various Responses in Strongly Strongly 
Groups I and II Unfavorable Unfavorable  Undecided* Favorable Favorable 

0-1 49 25 20 24 27 
2-5 28 34 47 32 41 
6-10 0 17 10 19 8 
>10 0 1 0 2 1 

Median difference** 0% +3% —1% 0% -2% 

Number significant at 5% level 3 4 3 8 5 

Number significant at 1% level 0 1 0: ae 0 3 


* For Group II, responses for undecided and I don't know enough about this to answer were summed and compared with the 


undecided response in Group I. 


** In each case, the proportion in Group II was subtracted from the proportion in the corresponding category of Group I. 


d 


Lack of Information and Undecided Response 


Table 3 


Percentage of Undecided Responses Given by 
Groups I and II 


Number of Items 


Group I Group IT 
% Undecided (Sresponses) (6 responses) 
04 1 4 
5-9 5 22 
10-14 19 28 
15-19 8 13 
20-24 10 8 
25-29 13 2 
30-34 4 0 
35-39 6 0 
40-44 3 0 
45-49 6 0 
50-54 0 0 
55-59 1 0 
60-64 1 0 
Median Value 22.2% 11.7% 


(measuring attitudes of members toward their 
unions), they asked respondents to indicate: 
(a) the extent to which the practice should or 
should not be followed, (b) their perception 
of the extent to which it actually was being 
followed, and (c) their general level of satis- 
faction with the current practice as it was 
perceived. It was reasoned that for satisfied 
persons, perceptions would correspond closely 
with desires. On the other hand, dissatisfac- 
tion would stem from a low correspondence 
between perceptions and desires, and unde- 
cided (or neutral) persons would exhibit a 
degree of correspondence somewhere between 
that of satisfied and dissatisfied persons. 

The authors’ hypothesis was supported on 
each of the 27 items of their questionnaire. 
Differences in the average degree of corre- 
spondence between desires and perceptions of 
satisfied and dissatisfied persons were sta- 
tistically significant on all but two items; 
the average degree of correspondence be- 


153 


tween desires and perceptions shown by un- 
decided respondents was between that of the 
other two groups on all items of the ques- 
tionnaire. 

Results from the present study and from 
the Rosens’ study lead to the following con- 
clusion: Undecided responses to items of a 
scientifically developed, professionally admin- 
istered questionnaire stem from two major 
sources: (a) persons who lack sufficient in- 
formation on the point in question to form 
an attitude or answer wisely, and, (b) per- 
sons who do have knowledge of the point in 
question, have considered the pros and cons, 
and have arrived at a neutral or “undecided” 
point. 

It would appear that the “I don’t know” 
response would be a wise addition to attitude 
surveys. This is especially true when items 
of the survey require information (e.g., knowl- 
edge of the activities of the national union) 
which may not be commonly held by the 
survey respondents. 


Received July 11, 1955. 


References 


1, Aylward, Merriam, Uphoff, W. H., Kirchner, 
W. K., & Dunnette, M. D. Development and 
validation of a union attitude questionnaire, 
Mimeo. Release No. 7. Minneapolis; Indus- 
trial Relations Center, Univer. of Minnesota, 
June, 1955. 

2. Dunnette, M. D., & Uphoff, W. H. Union atti- 
tudes and membership participation, Busi- 
ness News Notes, School of Business Adminis- 
tration, 1955, No. 24. 

3. Dunnette, M. D., Kirchner, W. K., & Uphoff, 
W. H. Development and validation of a un- 
ion attitude questionnaire. J. pers, Admin, 
in press. 

4, Rosen, H., & Rosen, Ruth. The validity of “un- 
decided” answers in questionnaire responses. 
J. appl. Psychol., 1955, 39, 178-181. 

5. Zubin, J. Nomographs for determining the sig- 
nificance of the difference between the fre- 
quency of events in two contrasted series or 
groups. J. Amer. statist. Ass., 1939, 34, 539- 
544. 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


The Prediction of Attrition in Trade School Courses 


C. H. Patterson ** 


Veterans Administration Regional Office, St. Paul, Minnesota 


The study of factors related to success in 
trade school training has received much less 
attention than the prediction of success in 
college training. Nevertheless a considerable 
number of studies does exist. There have been 
reviewed elsewhere (5), and will not be men- 
tioned here. Most of these studies are inade- 
quate in one or more respects, so that there is 
a need for studies utilizing adequate samples, 
an acceptable design including necessary con- 
trols, and appropriate statistical analyses the 
assumptions of which are met by the data. 
The present report is of a study attempting 
to meet these requirements. 

The institution and sample. The school 
studied is a large, privately endowed, non- 
profit school in a large city in the Middle 
West. A scattering of students is drawn from 
the entire country, but almost all are from 
Minnesota. Applicants must be 16 years of 
age or over with a general educational re- 
quirement stated as follows: “Educational 
background sufficient to indicate successful 
progress in the training program is the basic 
requirement for admission.” This is inter- 
preted to mean ordinarily the completion of 
the eighth grade, though a few students are 
admitted with completion of seven years of 
education. Very few applicants are denied 
admission because of poor school records, and 
a few are discouraged, but not denied admis- 
sion if they persist. 

The students in this study include those 
entering the school during the first three 
monthly enrollment dates of the 1953-54 
school year. The courses and the distribu- 
tion of the students among them are shown 


1 The writer is indebted to the staff of the institu- 
tion studied, who made the investigation possible, 
but who prefer to remain anonymous. 

2 Now at the University of Illinois. 

8 Although this article has been approved for pub- 
lication by the Veterans Administration, the conclu- 
sions reached are those of the author and do not 
necessarily reflect the position of the Veterans Ad- 
ministration. 


in Table 1 (Total Group). Over 80 per cent 
of all students entering these courses were in- 
cluded; those not included had incomplete 
data or reported too late to take the tests. 
The mean age of the sample of 350 students 
was 21.99, with an SD of 4.32. Mean educa- 
tion completed was 11.54, with an SD of 1.38. 

The sample used in the present study was 
compared to students tested during the re- 
mainder of the school year and the beginning 
of the 1954-55 school year. The three groups 
were similar in age, education, and test scores. 

Measuring instruments used. The follow- 
ing tests were used: the Bennett Test of Me- 
chanical Comprehension, Form AA (1), the 
Revised Minnesota Paper Form Board, Form 
MA (3), the Army General Classification 
Test, First Civilian Edition, Form AM (7). 
Other nonability factors were collected by 
means of a questionnaire which included the 
areas of personal data (age, marital status, 
dependents, veteran status, and length of 
time elapsed since the decision to enter the 
school), socioeconomic background (father’s 
occupation, father’s education, and urban or 
rural background), educational background 
(education completed, attitude toward school, 
subject liked best, number of shop, mathe- 
matics, and science courses reported taken, 
and interval since last school attendance) , 
and previous vocational training and experi- 
ence (work experience, previous training, and 
previous experience in the occupation se- 
lected). 

The criterion. The criterion used was the 
dichotomy of completion or noncompletion of 
the first six months of the course. This is an 
objective, practically significant criterion to 
attempt to predict. Approximately 40 per 
cent of those students entering the courses 
studied fail to complete more than six months 
of training. The proportion dropping after 
six months decreases rapidly; furthermore 
some of those leaving after completion of sev- 
eral months of the course are not failures, but 


154 


wa i 


Prediction of Attrition in Trade School Courses 155 
Table 1 
Pass-Fail Status of Students in Groups I and II and the Total Group by Courses 
Group I Group II Total 

Course Pass Fail Pass Fail Pass Fail 

Air Conditioning—General 6 2 SRS 11 5 
Air Conditioning—Refrigeration 7 2 5 4 12 6 
Automobile—General 16 13 21 9 37 22 
Automobile—Electrical 4 1 3 2 7 3 
Building Construction—Drafting and Estimation 10 7 12 4 22. val 
Building Construction—Carpentry 5 4 8 1 13 5 
Electrical—General 17 3 128 59) 29 Az 
General Mechanics 1 2 2 1 3 3 
Highway, RR, and Municipal Construction 3 2 4 1 7 3 
Machine Shop 12 5 13 3 25 8 
Mechanical Drafting 4 11 8 8 12 19 
Printing 9 1 TS 13 6 
Radio and Electronics 13 11 16 8 29 19 
Welding 2 2 2 2 4 4 
Total 109 66 115 60 ` 224 126 


leave to take jobs or enter apprenticeship 
training in the trade. 

It is recognized that such a criterion is a 
complex one. Students leave school for many 
reasons in addition to inability to handle the 
work, including interest, motivation, person- 
ality factors, etc. Some leave for reasons be- 
yond their control, such as personal illness, 
illness or death in the family, financial diffi- 
culty, or being drafted into service. An at- 
tempt was made to identify such students, 
from the reasons given by the students for 
leaving. These are probably unreliable. 
Those students giving one of the reasons listed 
above, with the exception of financial diffi- 
culty, were identified, and if they were also 
doing satisfactory work they were not in- 
cluded in the drop-out category. Twenty- 
five students in all were eliminated on this 
basis from the total group of 375 students 
upon whom complete data were available, 
leaving 350 to constitute the sample. 


Design and procedure. The tests and the ques- 
tionnaire were administered prior to the beginning 
of classes. Six months later each student was classi- 
fied as (a) failure—not in school (N =126), (b) 
successful—still in school (V = 224), or (c) neither 
—left school apparently for reasons beyond his con- 
trol (Y= 25). Those in the last category were dis- 
carded from the study. The remaining 350 consti- 


tuting the sample were split into random halves, by 
course. Each group was used both as an experi- 
mental and a control group in the procedure of 
double cross validation, and are designated as Group 
I and Group II in the tables. 

The questionnaire items were analyzed by means 
of x*. The test-criterion relationships were studied 
by means of biserial correlation and the linear dis- 
criminant function. 

The two random halves of the total group were 
compared in all measured characteristics. They did 
not differ in any respect except the correlations of 
the individual tests with the criterion. The biserial 
correlations of the Bennett, the AGCT Blocks, and 
the AGCT total scores were significantly lower in 
Group I. This is an important factor in the results 
and in the cross validation. The reasons for these 
differences are unknown. They suggest that great 
caution should be used in accepting a second sample 
as equivalent in cross validation. 


Results: Background factors. The results 
for the 18 background and socioeconomical 
factors studied are relatively meager. The 
individual most likely to persist in trade 
school training is between 20 and 30 years of 
age, has had several shop, science, and math 
courses in school, and has had some work 
experience, but not necessarily in the field in 
which he is training. Although number of 
years of education completed was not sig- 
nificant, an analysis comparing high school 
graduates with nongraduates indicated that 


156 C. H. Patterson 
Table 2 
Biserial Correlations of AGCT, Bennett, and Paper Form Board Tests with Pass-Fail Criterion for 
Group I, Group II, and the Total Group 
Group I Group II Total Group 
Mean, Mean, Mean, Meany Mean, Meany 

Test (ai) (V=66) runt (W=115) (= 60) rt (w=224) (W=126) riist" 
AGCT—Vocabulary 30.56 27.47 .297** 29.89 25.88  .366** 30.22 Tia 325" 
Blocks 31.02 29.76 104 32.53 26.73 .505** 31.79 28.32 .294** 
Arithmetic 36.54 34.02 .257** 36.56 32.20 .438** 36.55 3315, 343°" 
Total 97.66 91.09 .266** 98.81 84.73 -.535** 98.35 88.06 .397** 
Bennett 44.64 41.74 .213* 45.43 38.22 .509** 45.04 40.06 .358** 
RMPFB 4745 43.30 .261** 46.61 40.62 .421** 47.02 42.02 .342** 


* Significant at the .05 level. 
*k Significant at the .01 level. 


‘ek The standard error of an rbis of .00 (assuming the null hypothesis) is .096 for Group I, .097 for Group II, and .069 for the 


Total Group. 


high school graduates were more likely to per- 
sist in training. 

Results: Tests. The biserial correlations of 
the aptitude tests and criterion are shown in 
Table 2. An indication of the overlapping 
on the tests between the criterion groups is 
the fact that, in the total group, cutting scores 
on each test set to eliminate from 46 to 50 
per cent of the failures would eliminate from 
21 to 29 per cent of the successful students. 

Test scores were combined by means of the 
linear discriminant function. Analysis of the 
data indicated that they satisfied the assump- 
tions of multivariate normality and equality 
of the variance-covariance matrices sufficiently 
well to be suitable for the application of this 


Table 3 
Significance of Increase in D? Resulting from 
the Addition of Tests 

Total 
Tests* GroupI GroupII Group 

D L4 b 2 
1-0 <01 <.01 <.01 
2-1 >.05 <.01 <.01 
3-(1+2) <.01 05 <.01 
4-(142+3) >.05 <.01 >.05 
5-(14+2+3+4) >.05 >.05 >.05 
6-(1+2) >.05 <.01 <.01 
3-0 <.01 <.01 <.01 
43 >.05 <.01 <.01 
5-(3+4) >.05 >.05 >.05 


*1 = RMPFB, 2 = Bennett, 3 = AGCT—Vocabulary, 4 = 
AGCT—Blocks, 5 = AGCT—Arithmetic, 6 = AGCT—Total, 


technique. The application of Hotelling’s gen- 
eralized T test indicated that the five test or 
subtest scores taken together discriminated 
significantly between the two criterion groups 
in the total group. 

Three combinations of the five test scores 
were analyzed for significance of discrimina- 
tion in the two groups and the total sample. 
Table 3 indicates the levels of significance of 
the various tests and combinations as meas- 
ured by the D? statistic, which indicates the 
distance by which the two criterion groups 
are separated. It is apparent that using the 
part scores of the AGCT does not result in 
significantly greater discrimination than using 
the total score alone. The best combination 
of scores is the Paper Form Board, Bennett, 
and AGCT total. Although in Group I the 
addition of the latter two does not increase 
the discrimination significantly over that ob- 
tained with the Paper Form Board alone, 
they were retained in the equations for pur- 
poses of cross validation. 

The linear discriminant weights are as fol- 
lows: 


Group I L = .034955Xi + 013808X2 + .015998X0; 
Group II L = .029408X: + .074183Xs + .043446Xo; 
Total Group L = .033525X: + .039706Xs + .026995Xs. 


Equivalent multiple point-biserial R’s ob- 
tained by a method described by Fisher (2) 
are 36, .51, and .45, respectively, for the 
three equations. 

The weights given in the preceding equa- 


Prediction of Attrition in Trade School Courses 


tions were used, with a correction for relative 
proportions in the criterion categories, to ob- 
tain criterion scores for each category and a 
criterion discriminatory score for each group. 
These weights and criterion discriminatory 
scores for Group I and Group II were used in 
cross validating on the group not used in ob- 
taining the weights. The results of this dou- 
ble cross validation are shown in Table 4. It 
is apparent that the weights obtained on 
Group II do not hold up when applied to 
Group I, while those obtained on Group I do 
succeed in discriminating significantly when 
applied to Group II. The ¢ test of the sig- 
nificance of the difference between those cor- 
rectly classified and the number to be ex- 
pected to be correctly classified by chance— 
a more stringent test of the results than y2— 
is 3.90 for the latter cross validation, which 
is significant beyond the .001 level. 

These results, which are a consequence of 
the differences in validities of the tests in the 
two groups, are equivocal as to the value of 
the tests in selection. They indicate the need 
for further cross validation. 

Further cross validation. Additional cross 
validation was possible, since criterion data 
became available on 302 students tested dur- 
ing the latter part of the 1953-54 school year. 
Four of these students were eliminated as 
having left for reasons beyond their control. 
The mean agé of this sample was 22.86 (SD 
3.47) and mean years of education completed 
was 11.52 (SD 1.38). 

Discriminant weights and discriminant cri- 
terion scores determined upon the total group 
of 350 cases previously studied were applied 
to these 298 new cases, 170 of whom were 
successful and 128 failures, with the results 
shown in Table 5, where the new sample is 
designated as Group B and the previous group 
of 350 cases is designated as Group A. 
Forty-three per cent of Group B failed, 
compared to 36 per cent of Group A, which 
accounts in part at least for the relatively 
large proportion of failures who were pre- 
dicted as successful. The ¢ ratio of the ob- 
tained versus chance accuracy of prediction 
is 3.11, significant beyond the .001 level. 

Discussion and summary. The results ob- 
tained indicate that it is possible to predict, 


157 
Table 4 


Results of Cross Validation Using Group I Weights on 
Group II and Group II Weights on Group I 


Predicted Classification 


Group IT Using Group I Using 
Aa Group I Weights* Group II Weights** 
assi- 8 
fication Pass Fail Total Pass Fail Total 
Pass 104 11 115 92 17 109 
Fail 35 25 60 49 17 66 
Total 139 36 175 141 34 175 


*x? = 2.71, p > .05 <.10. 
PX? = 24.87, p < 001. 


with significantly greater than chance suc- 
cess, persistence in trade school training for 
at least six months, by use of the Minnesota 
Revised Paper Form Board Test, the Bennett 
Test of Mechanical Comprehension, and the 
Army General Classification Test. 

The degree of accuracy of prediction 
achieved leaves much to be desired. A cut- 
ting score on the composite criterion score set 
to eliminate 50 per cent of the drop-outs 
would also eliminate 30 per cent of the suc- 
cessful. Cutting scores could be set to elimi- 
nate 29 per cent of the drop-outs while re- 
jecting 9 per cent of the successful, or to 
eliminate 4 per cent of the drop-outs without 
rejecting any of the successful. 

The level of prediction achieved is reduced 
by two factors which must be recognized. 
The first is the fact that the criterion is 
admittedly complex. Students drop out for 
many reasons, including ability, interest, and 
motivation, as well as for reasons beyond 
their control. An analysis of the grades of 
the 126 drop-outs in Group A indicated that 


Table 5 


Results of Cross Validation Using Group A 
(Total Group) Weights on Group B 


Predicted Classification* 
Actual RIIE TOERE AIEEE 


Classification Pass Fail Total 
Pass 156 14 170 
Fail 92 36 128 
Total 248 50 298 


*x? = 20.69, p < .001. 


158 


3 were doing above average work at the time 
they left school, and 23 were doing average 
work. For Group B, 13 were above average, 
and 22 average. 

A second factor affecting the results is that 
of differences among the courses studied. In 
the present study, 15 courses were grouped 
together, since there were not enough stu- 
dents in each course to study them separately. 
This study produced evidence from the analy- 
sis of variance of the course means that there 
are differences on the tests among students 
entering the various courses (4).. The course 
means on these tests correlate significantly 
(p’s = 59 to .80) with rankings of the 
courses by difficulty level by two competent 
judges, who correlated .93 in their rankings. 
There is further evidence that, although stu- 
dents thus appear to select themselves to some 
extent in terms of the difficulty levels of the 
courses, the drop-out ratio still may vary sig- 
nificantly among the courses, and that this 
ratio may be related positively to the diffi- 
culty levels of the courses. 

These considerations indicate that it would 
be desirable to study further the relationships 


C. H. Patterson 


between these tests and a modified criterion 
better reflecting achievement in the school, 
treating courses, or groups of homogeneous 
courses, separately. 


Received July 12, 1955. 


References 


1. Bennett, G. K. Test of Mechanical Comprehen- 
sion, Form AA. Manual. New York: Psy- 
chological Corporation, 1951. 

2, Fisher, R. A. Statistical methods for research 
workers. (11th Ed.) London: Oliver & 
Boyd, 1950. 

3. Likert, R, & Quasha, W. H. The Revised Min- 
nesota Paper Form Board Test Manual. New 
York: Psychological Corporation, 1948. 

4. Otterness, W. B., Patterson, C. H., Johnson, R. H., 
& Peterson, L. R. Trade school norms for 
some commonly used tests. J. appl. Psychol., 
1956, 40, 57-60. 

5. Patterson, C. H. Tests and background factors 
related to drop-outs in an industrial institute. 
Unpublished doctor’s dissertation, Univer. of 
Minnesota, 1955. 

6. Rao, C. R. Advanced statistical methods in bio- 
metric research. New York: Wiley, 1952. 

7. Science Research Associates. Examiner Manual 
for the Army General Classification Test, 
First Civilian Edition. Revised. Chicago: 
Science Research Associates, 1948. 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


The Selection of Graduate Students in Public Health 
Education 


Richard 


P. Barthol 


Pennsylvania State College 


and Barb 


University of California 


This study was undertaken to determine 
whether tests could improve significantly the 
procedures used to select students for admis- 
sion to a graduate program leading to an 
M.P.H. degree in Public Health Education, 
and to specify how these tests should be used. 

In 1952 students were admitted to this pro- 
gram on the basis of academic records, refer- 
ences, and other historical data. A selection 
and validation program was started at that 
time by the University Counseling Center and 
covers three separate classes in public health 
education, limited to those students who were 
of English speaking heritage. The first two 
classes, Class A (N equals 20) and Class B 
(N equals 11), were selected by members of 
the School faculty without access to the test 
data. These students were tested at the time 
of admission, and the test data were filed. 
Class C (N equals 11) was selected partially 
by the test results. Formal analysis of the 
test data was not started until Class C had 
completed its academic year. 


The Test Battery and Criteria 


The academic preparation and subsequent 
work of the public health educator were in- 
vestigated. A test which had not yet been 
published, the Concept Mastery, was selected 
as a predictor of academic success.1 A stable 
personality structure plus a genuine interest 
in welfare and in working with people were 
indicated. The Strong Vocational Interest 
Blank and the Minnesota Multiphasic Per- 
sonality Inventory were selected to meet 
these needs. 


1 This test was developed by Dr. Lewis M. Ter- 
man and associates for follow-up of his gifted group. 
It has been used with advanced graduate students, 
and will be published by the Psychological Corpora- 
tion. 


ara A. Kirk 


Counseling Center, Berkeley 


For criteria, rankings of academic progress 
by members of the faculty were used. Two 
subrankings proved useful—organizational 
ability and facility with interpersonal rela- 
tions. (Grades were not used because the 
range of grades at the graduate level was so 
restricted.) The students in each class were 
ranked independently by three raters at the 
close of each academic year. In addition, 
students in Classes A and B were evaluated 
after they had been working out in the field. 
These latter rankings were based partially on 
the reports of the field supervisors, and par- 
tially on personal observation.? 

Scores on an achievement test, the Ameri- 
can Public Health Association Examination, 
were available. Classes A and B were given 
the standard examination at the beginning 
and the end of the academic program. Class 
C was given an abridged version of this test. 
The test consists of 300 multiple-choice ques- 
tions covering the fields in public health edu- 
cation. The abridged version, 110 questions, 
does not have the national norms available 
for the standard examination. It must be 
acknowledged that the APHA Examination 
was not considered a predictor of future aca- 
demic success, but only a measure of past 
academic achievement. The pre- and post- 
tests had been used by the School to assist 
the staff in planning and evaluating the gradu- 
ate program. Only the pretest was used in 
this study. 

Methods and Procedure 


Rank-order correlation coefficients were obtained 
for the Concept Mastery and APHA examinations 


2Dr. Dorothy B. Nyswander and Dr. William 
Griffiths, who initiated the study, and who had the 
foresight and patience to allow it to be done in the 
fashion indicated, did most of the rankings. Miss 
Sarah Mazelis also contributed her time, experience, 
and understanding to this project. 


159 


160 


with the criteria, Cut-off scores were obtained for 
each. The mean score of the APHA norm group 
was used for that test. A raw score of 55 was se- 
Jected for the Concept Mastery based on rankings 
of Class A. 

A far more difficult task was the determination of 
selection scores and profiles for the Strong and 
MMPI. First, a clinical analysis of the results for 
each student was made by staff members of the 
Counseling Center skilled in this technique, and each 
student was placed in an accepted or rejected group. 
The Strong and MMPI scores for each student were 
color coded and posted to a master profile sheet 
which was examined for patternings to see if a spe- 
cific score or pattern would predict success or failure. 

The hypotheses developed for one class could be 
immediately checked with the other two classes. 
Two statistical techniques were used. One was to 
use a test as a screening device and then obtain a 
rank-order correlation for the remaining students 
between another test and the criteria. The other 
method was to divide the class into two groups at 
the cutting point of the test and then use the Mann- 
Whitney U test to see if the passed group were 
ranked significantly higher than the rejected group. 


Results 


The criteria measures (ranking of the stu- 
dents by the faculty) were accepted at face 
value as being appropriate measures of suc- 
cess. Three judges were used for ranking the 
students in academic success. Coefficients of 
concordance were computed and in every case 
the null hypothesis was rejected at the .01 
level of probability. Only two judges were 
qualified to make the postacademic rankings, 
so rank-order correlation was used. The co- 
efficients for Classes A and B were .95 and 


Richard P. Barthol and Barbara A. Kirk 


.87, respectively, both significant at the 01 
level. The rankings for each class were com- 
bined to form the criteria. The rankings of 
academic success were compared with the 
rankings of postacademic success, and cor- 
relations of .73 and .75 for Classes A and B 
were obtained, both significant at the .01 
level. It was felt that the measures were 
shown to be stable. 

Table 1 contains a summary of the impor- 
tant relationships found among the tests and 
the criteria, and in general indicates that 
Class B tended to conform to normal expec- 
tations: the students with the best back- 
ground, the highest previous achievement, 
and the highest level of mental ability were 
the better students and the better workers. 
This held true to a lesser extent for Class A. 
Class C, of small number, did not follow this 
pattern. Two students with the most aca- 
demic potential produced the least, because 
of emotional stress at this period, as reported 
by the staff. 

Table 2 shows the effectiveness of the four 
tests had they been used as screening devices. 
The Mann-Whitney U test was used to de- 
termine whether the group that would have 
been admitted had significantly higher rank- 
ings than the group that would have been re- 
jected. None of these hurdles significantly 
affected Class C. This was anticipated, since 
the same tests had already been used for se- 
lection, although with different. standards. 


Table 1 
Rank-Order Correlations of Tests and Criteria 


Class A Class B Class C 
Variables N p N p N p 
APHA and academic rank 17 62** 11 ae. 9 —.71* 
APHA+MMPI and academic rankt 15 .80** 9 .85** it 18 
APHA and job performance 17 .50* 9 -70* = _ 
APHA+MMPI and job performancet 15 .51* 6 80 = me 
CM and academic rank 20 18 11 53% 10 —.41 
CM and job performance 20 00 9 32 os = 
CM and APHA 17 42* 11 .89** 8 —.06 
Experience and job performance 20. ~Al* sD he — — 
* pP < .05. 
** P < 0l. 
+The MMPI was used for screening. The correlation is between the APHA and the criterion. 


A 


= 


Selection of Graduate Students 161 
Table 2 
Significance of Difference of Rankings of Classes Divided into Two Groups by Single Test Results 
Significance* P< 
Class A ClassB Class C 
Test Basis for Division N=20 N=11 N=11 
MMPI Any score (except Mf)=70 01 N.S. N.S. 
Strong Clinical evaluation N.S. 05 N.S. 
Strong OL=49 or MF=55 (men only) .05 NS. NS. 
CM Raw score=55 NS. NS. a 
APHA exam Mean of APHA norm group 01 OL =. 
MMPI-+Strong+CM See Table 4 01 01 N.S. 


* Mann-Whitney U test. 
4 No scores below 55. 
N.S, = not significant. 


Students with scores above the cutting score 
of 55 on the Concept Mastery were not 
tanked significantly higher than were the stu- 
dents with scores below 55. Scores ranged 
from 30 to 157. Students with scores above 
100 were usually found in the top third of 
the class, but three of the best students had 
scores well below 100. 

The MMPI was used to eliminate students 
who had any standard scores above 70, ex- 
cept on the Mf scale. The retained group 
was significantly better than the rejected 
group for Class A, but the differences were 
not significant for the other two classes. 

When the classes were divided by a clini- 
cal evaluation of the Strong, only in Class B 
was the retained group significantly better 
(.05) than the rejected group. 

Based on the analysis of the Strong com- 
bined profiles, male students were considered 
to be rejected with an OL score below 49 and 
an MF score above 55. (This was appro- 
priate for men only, since there is no OL 
score for women.) ‘Table 2 indicates that 
only in Class A was the high OL-low MF 
group significantly better (.05) than the low 
OL-high MF group. Inspection of the data 
for the other two classes suggested that the 
hypothesis might have been supported in 
these classes had the N’s not been so very 
small (7 and 6, respectively). 

The analysis of the Strong indicated that 
many different profiles could be associated 
with success in the public health education 
graduate curriculum. Apparently most male 


applicants for this particular School, whether 
they ultimately succeeded or failed in the pro- 
gram, tended to have high scores in Group V 
(welfare). On the other hand, apparently, 
they do not have interests similar to Group II 
(the physical sciences). The only Strong 
group that seemed to be positively related 
to success was Group X (verbal-linguistic). 
Five of the best eight students scored A in 
Group X, while only one of the worst eight 
and one of the middle ten had A’s in Group X., 

No clear-cut pattern emerged from the 
Strong test for women. Although high scores 
on the Social Worker, Psychologist, and Law- 
yer scales seem to be appropriate, several of 
the lowest-rated women received high scores 
in these three areas. High scores in the busi- 
ness and domestic occupations, when not sup- 
ported by other interests, seemed to be inap- 
propriate, but the evidence was not strong. 
It is possible that scores above 55 on the FM 
scale for women would indicate an inappro- 
priate pattern, but the number of cases is too 
small to make any firm statement. 

Those test results that seemed most useful 
were abstracted. Tables 3 and 4 show the 
characteristics for each test that seemed to 
predict either success or failure in the public 
health education program. Also included are 
the student background characteristics that 
seemed to be related to success or failure. It 
is apparently much easier to use the tests for 
screening out than for predicting success; 
that is to say, it is easier to tell which stu- 
dents are likely to fail than which students 


Richard P. Barthol and Barbara A. Kirk 


162 
Table 3 Table 4 
Variables Indicating Success in Curriculum Variables Indicating Failure in Curriculum 
Instrument Sign Instrument Sign 
A’s in Group X Nae Depressed profile 
Strong OL> 55, MF <49 High manipulative w/o other 
Clinical evaluation support 
Strong (Men) MF above 55, OL be- 
MMPI Nothing found low 49 
Concept Mastery Scores above 100 (Women) High domestic or 
business w/o other support 
inati f APHA 
APHA examination Scores above mean o! San RS bots a 
norm group 
MMPI Any score (except Mf) above 


Academic preparation in 
public health 

Work experience in public 
health 


Background evaluation 


are likely to be at the top of the class. When 
all of the negative principles were applied to 
Classes A and B, only six of the original 
twenty remained in Class A, and three of the 
original eleven in Class B. Four of the best 
five students were retained in Class A, and 
three of the best five were retained in Class 
B. Had the suggested criteria been used 
without reference to any other factors, the 
selected group would have been very much 
smaller, but significantly better than the re- 
jected group. In Class C, five middle-ranked 
students would have been rejected had the 
tests been used again with the additional 
negative weightings. Since none of the best 
students was dropped, the suggested negative 
criteria would probably be more rigorous than 
the former methods but would pass the best 
students. 

The value of the three tests as aids in se- 
lection was subjectively supported by mem- 
bers of the School faculty who stated that 
Class C was the best group of students they 
had had. They further stated approval of 
the selection that would have been made had 
the tests been used for all three classes. Ob- 
viously this subjective evidence is not criti- 
cal; the important fact is that it is not at 
variance with the statistical evidence. 


Discussion 


This study indicates once again that a test- 
ing program, even though carefully and logi- 


70 
Clinical appraisal 
Concept Mastery Scores below 55 


Scores below mean of APHA 
norm group 


APHA examination 


No academic preparation in 
either biological sciences or 
public health 

No work experience in public 
health, medical or social 
service occupations 


Background evaluation 


cally designed, must undergo an empirical 
analysis. The Strong, usually considered a 
powerful selection device, was not shown to 
be of great value. There did not seem to be 
any scale or combination of scales that would 
pick out the interest pattern of successful 
students in this field; there were, however, 
patterns that seemed to be contraindicators. 
The MMPI was useful for negative screening 
but did not seem to be able to predict suc- 
cess. The Concept Mastery had a cutting 
score for elimination and also a higher score 
that was a good predictor for success. Scores 
in between the two had no predictive value. 
The APHA, which was only in the battery by 
chance, not only had a good cutting score but 
also ranked the students in the approximate 
order of later success. 

Post facto reasoning indicates that since 
the APHA Examination is an achievement 
test, most students with inadequate back- 
ground in public health education would tend 
to make low scores. This could predict fail- 
ure because either the background is quite 
important or applicants with long-term inter- 


Selection of Graduate Students 


ests in the field make the best students. On 
the other hand, if a student had the proper 
background but still made a low score, it 
might indicate a lack of ability to apply him- 
self. Superior ability of this sort might be 
indicated when a student without much back- 
ground scores well on the APHA. 

The other three tests in combination worked 
well in eliminating poor students but lacked 
discrimination, since they also screened out 
some of the very good students. If there are 
many more applications than openings, this is 
not important since good students would be 
rejected anyway. If there is enough space to 
absorb all good applicants, then harm is done 
both to the student and to the program. This 


163 


problem has not yet been resolved, since we 
have not been able to find the characteristics 
that distinguish the good students with nega- 
tive test results from the poor students with 
negative test results. 

Except for two low-achieving students in 
Class C, all who survived the tests did well. 
The two students were investigated; they 
both had unanticipated medical problems 
which interfered with their success. One of 
the highest students in Class A had negative 
indicators on all of the tests. Because of in- 
sufficient clinical information, we have no hy- 
potheses about why this occurred. 


Received September 12, 1955. 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


A Comparison of Successful and Unsuccessful Students in 
the Medical School at the University of Minnesota 


Vivian H. Hewer 


University of Minnesota 


The Problem 


Faculty and administrators of the Medical 
School at the University of Minnesota have 
become increasingly concerned with the num- 
ber of students dropped from the Medical 
School during recent years. Poor academic 
achievement and inadequate personal adjust- 
ment are two reasons cited for the loss. 

In an effort to reduce this loss, an attempt 
is being made to improve selection of stu- 
dents admitted to medical school. For a 
number of years, psychological tests including 
the Professional Aptitude Test, now known as 
the Medical College Admission Test, and the 
Minnesota Medical Aptitude Test, as well as 
other methods, were used to select students. 
A marked change was made in the kinds of 
psychological tests required of students seek- 
ing admission to the freshman class in the 
fall of 1954. Whereas formerly the emphasis 
in testing had been on medical aptitude, the 
decision was made at that time to evaluate 
two other psychological attributes, interests 
and personality adjustment, by tests. The 
Strong Vocational Interest Blank (SVIB) 
was selected to measure vocational interest, 
and the Minnesota Multiphasic Personality 
Inventory (MMPI) to measure personality 
adjustment. The Miller Analogies Test (Form 
H) was also added as a test of academic ca- 
pacity. The Minnesota Medical Aptitude 
Test was discontinued; and the Medical Col- 
lege Admissions Test was retained. 

There are scattered research data, but very 
few related to the efficiency of the newly se- 
lected tests to predict success or persistence 
of students in medical training. Studies by 
Dvorak (1), Melton (5), and Hewer (4) 
found that score on the Physician key did 
not contribute to prediction of success in pre- 
medical training. Schofield (6), using MMPI 
and equating the ability level, reported medi- 
cal students with superior grades had signifi- 


cantly lower scores on some scales of the test 
than did those with inferior grades. Glaser 
(2) also studied the relation of scores on 
MMPI and Miller Analogies (Form G) to 
success in medical school. 

This research is concerned with the com- 
parison, by various methods, of the scores 
made on tests by a group of successful and a 
group of unsuccessful medical students. The 
purpose is to determine whether any of the 
tests, including the newly selected SVIB and 
MMPI, can be used to predict success in 
medical school. 


Method 
Sample 


The successful group was composed of men from 
two classes of medical students, those who entered 
in the fall of 1951 and who had successfully com- 
pleted two years of medical school (N = 115), and 
those who entered in the fall of 1952 and who had 
successfully completed one year of medical school 
(N=110). These data were taken from a paper 
by Smith (7), summarizing test performance of a 
group of successful medical students. The unsuc- 
cessful group (N =29) was composed of all male 
students dropped from medical school because of 
scholastic failure during a five-year period, 1949- 
1953. Those students who were re-admitted and 
were successful on a second trial in making a satis- 
factory average were excluded from the sample. All 
the available test data on both groups were secured 
from the files of the Student Counseling Bureau at 
the University of Minnesota. 


Procedure 


Three different approaches to the analysis of test 
results were used in this study. In the first, tests 
were applied to determine whether a significant or 
reliable difference exists between the mean scores 
made by the two groups on a variety of tests. The 
t test or d test, the Behrens-Fisher, was used, the 
latter when variance about the mean when tested by 
the F test was not found to be homogeneous. The 
variables tested in this part of the study were the 
following: 


1. High School Rank (HSR), the percentile rank 
of the student in his own high school class. 
2. Total Premedical Honor Point Ratio (Total 


164 


Comparison of Successful and Unsuccessful Medical Students 


165 


Table 1 
F Test and ¢ Test Comparisons on Mean Scores on Tests for Successful and Unsuccessful Medical Students 


Successful Unsuccessful 
Test N Mean N Mean F t 

1. HSR 187 87.2 25 81.7 1.07 1.69* 
2. Total Pre-Med HPR 213 2.2 29 1.9 1.47 4.44" 
3. Req Sc Pre-Med HPR 213 2.2 29 1.8 1.38 3.638% 
4. ACE ’47 184 126.6 26 117.9 1.18 245" 
5. Coop Eng S 187 206.1 27 198.2 rI 1.43 
6. Prof Apt 

a. Verbal Ability 206 538.7 26 531.3 1.17 AL 

b. Quant Abil 206 579.6 26 545.2 1,08 1.85* 

c. Mod Soc 206 555.4 25 553.0 1.26 14 

d. Med Sc 206 556.0 26 527.9 1.08 15 
7. Minn Med Apt 208 168.4 27 156.7 1.29 2,600" 
8. Phys Key 106 43.0 19 42.1 1.21 30 


* m1, 
wt = 05. 
w = 01 


Pre-Med HPR), which represents grades earned in 
all courses taken in premedical training.! 

3. Required Science Premedical Honor Point Ratio 
(Req Sc Pre-Med HPR), which represents grades 
earned in all science courses required for entrance to 
medical school. 

4, American Council on Education Psychological 
Examination, 1947 form, (ACE °47), a college apti- 
tude test. 

5. Cooperative English Test-Form S, lower level 
(Coop. Eng S), a test of achievement in English. 

6. Professional Aptitude Test (PAT), now known 
as Medical College Admission Test. The four parts 
studied were verbal ability, quantitative ability, 
modern society, and medical science. 

7. Minnesota Medical Aptitude Test (Minn. Med 
Apt), only total score was analyzed. 

8. Strong Vocational Interest Blank (SVIB), only 
score on the Physician key was used. 

9. Minnesota Multiphasic Personality Inventory 
(MMPI). 

The second analysis was the application of the 
chi-square test to the frequencies within the two 
groups of the occurrence of patternings of elevation 
of MMPI scales. Hathaway (3) suggests a rather 
elaborate system of coding which takes into account 
the degree of elevation of MMPI scales. A very 
simple approach was used in this study. Code num- 
bers were attached to the two scales with the high- 
est scores regardless of degree of elevation. For ex- 
ample, a profile with Pd and Ma the highest points 
was coded 49. The codes were then tallied, tallying 
each scale, rather than combination of two scales. 
It was to these frequencies that the chi-square test 
was applied. 


1 Honor point ratio is figured with A=3, B=2, 
C=1, D=0, F=0 honor points. 


In the third and final analysis, SVIB and MMPI 
profiles were prepared on the successful and unsuc- 
cessful medical students although these data were not 
available for the total sample. Three judges? were 
asked to sort SVIB profiles into two groups, indi- 
cating those whom they would and would not rec- 
ommend for acceptance in medical school on basis 
of interest measurement. MMPI profiles were sorted 
by two clinical psychologists? into three groups— 
accept, reject, and hold. Again with regard to rec- 
ommendation for medical school, chi-square tests 
were then applied to determine whether judges could 
identify the successful and unsuccessful students 
through a blind sort of test profiles. 


Results 


Results of ¢ tests on the first eight variables 
studied are presented in Table 1. Differences 
between total premedical honor point ratio of 
successful and unsuccessful medical students 
are significant at the .01 level. Difference 
between mean scores of the successful and 
unsuccessful students on the Minnesota Medi- 
cal Aptitude Test is also significant at the .01 
level and on the ACE at the .05 level. 

Difference in mean score for the two groups 
on each scale of MMPI were tested and L- 
scale score is the only one which differentiates 


2 The writer is very grateful to Dr. Ralph Berdie, 
Dr. Theda Hagenah, and Dr. Wilbur Layton for 
their assistance in this part of the study. 

3 The writer is very grateful to Dr. Starke Hath- 
away and Dr. William Schofield for their assistance 
in this part of the study. 


166 


the two groups at the .05 level. It should be 
added that in this analysis d tests were ap- 
plied to scores on the L (lie), D (depression) 
and Pd (psychopathic deviate) scales because 
F tests indicated lack of homogeneity of vari- 
ance. 

Casual inspection of the MMPI profiles 
created the impression that those of the un- 
successful students were more deviate than 
those of successful students. The above 
analysis in which scores of the two groups on 
specific scales were compared revealed no sig- 
nificant differences except on the L scale. A 
second approach to check this impression 
further was to code the MMPI patterns in 
the manner suggested under procedure. An 
inspection of the percentage distributions of 
frequency of the scales suggested, for exam- 
ple, that a statistically higher percentage of 
the unsuccessful students might have elevated 
Pd scores than did the successful students. 
The reverse appeared to be true of the Ma 
scale. The results of the chi-square test ap- 
plied to check this hypothesis are not con- 
clusive, probability occurring between the .05 
and .10 levels. There is, however, at least a 
suggestion that unsuccessful medical students 
may differ’ from successful students in per- 
sonality organization as measured on MMPI. 

In the third analysis, the blind sort of 
SVIB profiles, there was a high degree of 


Table 2 


Chi-Square Tests of the Agreement Between Judges’ 
Ratings of SVIB Profiles and Criterion 


Criterion 
Success- Unsuc- 
Judges ful cessful 
Successful 51 16 
Judge t Unsuccessful 54 3 8.23** 
Successful 64 15 
Joggen, Unsuccessful 41 4 2.25 
Successful 60 16 
Jodeci Unsuccessful 45 3 4.97* 
Successful 50 15 
Azee Unsuccessful 37 3, 2 423e 
* = 05. 
= = 01, 


Vivian H. Hewer 


agreement among the three judges. Judge I 
agreed with Judge II 89 per cent of the time 
and with Judge III 91 per cent of the time; 
Judges II and III agreed 89 per cent of the 
time, and all three judges agreed 85 per cent 
of the time. It was not possible, however, 
for the judges to identify medical students 
who failed by inspection of their SVIB pro- 
files. Of the 19 students who failed; the 
three judges agreed that 15, or 79 per cent, 
of these should have been accepted in medi- 
cal school. They agreed to reject 37 or 35 
per cent of the successful students. Table 2 
is a report of chi-square tests of the agree- 
ment between judges’ ratings of SVIB pro- 
files and the criterion. 

Two of the chi-square tests are significant 
at the .05 level, and one at the .01 level. It 
will be noted, however, that the results are in 
a direction opposite from what might be ex- 
pected. In other words, not only were the 
judges not able to identify medical students 
who failed from a rating of their SVIB pro- 
files, but they agreed in labeling as unsuc- 
cessful those who succeeded and as success- 
ful those who failed (.05 level). 

The two judges who sorted the MMPI pro- 
files agreed 82 per cent of the time. Here 
again it was impossible to identify the un- 
successful medical students. Of the 17 un- 
successful students, both judges agreed they 
would recommend acceptance of 11, or 65 
per cent of them. These judges would not, 
however, reject many of the successful stu- 
dents, nor, for that matter, of the unsuccess- 
ful. Chi-square tests applied to these data 
are presented in Table 3. In the fourth test, 
those cases which the judges agreed should 
be rated “hold” and “reject” were combined 
in the unsuccessful group. There is no indi- 
cation that unsuccessful medical students can 
be identified from their MMPI profiles, since 
all chi-square tests indicate a chance ranking 
of the profiles when compared to the cri- 
terion. 

One other comparison was made, a check 
to see how many students would be rejected 
on the basis of both MMPI and SVIB. In 
only one case, and he was a successful stu- 
dent, would the individual have been re- 
jected on both SVIB and MMPI by all five 
judges. 


i 


Comparison of Successful and Unsuccessful Medical Students 


Discussion 


Successful medical students have signifi- 
cantly higher premedical honor point ratios, 
both when grades in all premedical courses 
and when grades only in required science 
courses are considered, than do unsuccessful 
medical students. This result gives further 
evidence for continuing the practice of em- 
phasis on premedical grades for selection. 
The analysis of scholastic aptitude tests sug- 
gests that successful medical students as a 
group have a significantly (.05 level) higher 
mean score on ACE than do unsuccessful stu- 
dents. The successful medics made signifi- 
cantly higher scores on the Minnesota Medi- 
cal Aptitude Test, but score on PAT did not 
differentiate the two groups. 

The significantly higher Z score on the 
MMPI for unsuccessful medical students sug- 
gests a higher degree of defensiveness among 
them in responding to MMPI items which 
may have obscured basic differences in the 
total profile. Practicing counselors have also 
suggested the L scale may have clinical value 
in describing unsophisticated persons with 
poor self-understanding and insight. Inter- 
estingly, Schofield (6) found a group of low- 
achieving medical students had significantly 
higher L scores than did a group of high 
achieving students at the same ability level. 
The similarity of these findings may stimu- 
late further research in this area. It does ap- 
pear, however, that the L score may serve as 
an indicator of unrealistic attitudes toward 
self which in some way is related to aca- 
demic achievement. 

A test applied to the distribution of pat- 
ternings of MMPI scores gave inconclusive 
evidence of differences in personality pattern- 
ings between the two groups. The unsuc- 
cessful students seemed to have a dispropor- 
tionate number of high Pd scores, suggesting 
individuals of low social concern, poorly or- 
ganized goals, and general immaturity. This 
hypothesis, however, is in need of further 
check. 

Experienced psychologists were unable to 
identify successful and unsuccessful medical 
students through the use of either of SVIB or 
MMPI profiles. This, perhaps, is not too 
surprising, as it will be recalled that the un- 


167 


Table 3 


Chi-Square Tests of the Agreement Between Judges’ 
Ratings of MMPI Profiles and Criterion 


Criterion 
Success- Unsuc- 
Judges ful cessful 
Successful 78 12 
Judge I Unsuccessful 6 1 04 
Successful 74 12 
TGN Unsuccessful 10 2 — 005 
ITARA Successful 68 11 
5 Unsuccessful 5 1.03 
Agree Successful 68 11 
(Unsuccessful Unsuccessful 8 KpE ie 
and Hold) 


successful students were those dropped for 
academic failure. These tests may serve to 
predict later adjustment to the profession, 
an hypothesis that could be checked only 
through follow-up. Medical school adminis- 
trators and faculty are concerned with select- 
ing students who not only can meet academic 
requirements in training, but also who can be 
successful in practicing medicine. 

Some students fail in medical school for 
reasons other than scholastic reasons—lack of 
interest, emotional instability, or difficulty 
with interpersonal relationships. It would be 
expecting too much to have tests identify 
those who failed both for academic and other 
reasons. Further study of the latter group 
is required. 

Conclusions 

This study is a comparison of scores made 
on a group of tests by successful (M = 225) 
and unsuccessful (N = 29) medical students. 
It was hoped that the results of the study 
would give some assistance in evaluating cur- 
rent medical school procedures in selection. 

The following results were found: 

1. Successful medical students make sig- 
nificantly higher grades (.01 level) in their 
premedical courses than do unsuccessful medi- 
cal students. This is true not only when the 
total honor point ratio is considered, but also 
when honor point ratio in science courses re- 
quired for medical school is considered. 


168 


2. Successful medical students make sig- 
nificantly higher scores (.01 level) on the 
Minnesota Medical Aptitude Test and on the 
ACE (.05 level) than do unsuccessful stu- 
dents. 

3. Unsuccessful medical students had a sig- 
nificantly higher score (.05 level) on the L 
scale of MMPI. This suggests a defensive- 
ness in responding to MMPI items or, pos- 
sibly, low psychological maturity. 

4. There is a suggestion (chi-square .10 > 
P > .05) that the general personality organi- 
zation, as measured on MMPI, may be differ- 
ent for successful and unsuccessful students. 

5. Experienced psychologists were unable 
to identify successful and unsuccessful medi- 
cal students through the use of either SVIB 
or MMPI profiles, 


Received August 17, 1955, 


Vivian H. Hewer 


References A 


1. Dvorak, Beatrice. Adjustment of pre-medical 
freshmen to the university. Unpublished mas- ` 
ter’s thesis, Univer. of Minnesota, 1930. 

2. Glaser, R. Predicting achievement in medical 
school. J. appl. Psychol., 1951, 35, 272-275. 

3. Hathaway, S. R, & Meehl, P, E. An atlas for 
the clinical use of the MMPI. Minneapolis: 
Univer. of Minnesota Press, 1951. 

4. Hewer, Vivian H. Vocational interest-achieve- 
ment-ability inter-relationships at the college 
level. Unpublished doctor's thesis, Univer. of 
Minnesota, 1954, 

5. Melton, R. Prediction of success of pre-medical 
freshmen at the University of Minnesota. 
Unpublished master’s thesis, Univer. of Min- 
nesota, 1951. 

6. Schofield, W, A study of medical students with 
the MMPI: III, Personality and academic 
success. J. appl. Psychol., 1953, 37, 47-52. 

7. Smith, Joyce S. Summary of test performance 
of medical students. Unpublished master’s 
Paper, Univer, of Minnesota, 1954. 


The Journal of Applied Psycholo 
Vol. 40,No.3, 1956 PIAI T 


The Development and Standardization of a Preliminary 
Form of an Activity Experience Inventory: A 


Measure of M 


anifest Interest * 


Wm. Price Ewens 


Agricultural and Meci 


Studies of interests and methods of collect- 
ing interest data have led to various clas- 
sifications of interests (2, 6, 9). Expressed 
interests, manifest interests, inventories inter- 
ests, and tested interests are categories found 
in these sources, Although recognized as an 
interest type, manifest interest has received 
little consideration by researchers concerned 
with interest measurement. Tests initially 
considered measures of manifest interest (4, 
7, 8) have since been classified as measures 
of tested interest (6). 

Manifest interest has been defined as being 
“synonymous with participation in an activity 
or an occupation” (6). In apparent agree- 
ment with this definition, Travers (9) states 
that “Manifest interests are determined by 
observing what the individual does in his 
spare time or perhaps at work.” The mag- 
nitude of the task and the difficulty of deter- 
mining manifest interest by observation for 
any number of subjects becomes immediately 
apparent, but it seems likely that manifest 
interest might be determined through the use 
of a self-report inventory. 

Dressel and Matteson (1) used the Kuder 
Preference Record items to measure experi- 
ence by changing directions for the instru- 
ment. They concluded that this was an un- 
satisfactory method of measuring experience 
and expressed the opinion that it would be 
necessary to develop an instrument specifi- 
cally designed to measure experience. 


Definition of Problem 


The study on which this report is based 
had two major purposes. The first was to 


1 This study was conducted in the Department of 
Education of Stanford University under the direction 
of Dr. H. B. McDaniel. A more complete presenta- 
tion of results may be found in the original study, 
“Experience patterns as related to vocational pref- 
erence.” Unpublished doctor’s dissertation, 1949, 
Stanford University Library, Stanford, California. 


hanical College of Texas 


develop and establish normative data for an 
experience inventory and the second? was to 
examine the relationships between experience, 
as measured by the instrument, and prefer- 
ence as measured by the Kuder Preference 
Record. This report will be concerned with 
the first of the above listed purposes. 

The problem of developing an experience 
inventory was further defined to give direc- 
tion in item selection and in inventory de- 
sign. The inventory was (a) to measure ac- 
tivity experience in the interest areas used by 
the Kuder Preference Record, Form BB; (b) 
to be objectively scored and give a composite 
experience score for each of the interest areas; 
(c) to include activity items within the prob- 
able experience of boys and girls of high 
school age; (d) to use vocabulary of high 
school level; (e) to include directions suffi- 
ciently specific to permit self-administration 
if necessary; and (f) to be administered and 
scored within fifty minutes. 


Procedure 


Development of the Activity Experience In- 
ventory 


From approximately 2,000 activity items written 
by graduate students in counselor training at Stan- 
ford University a trial-form inventory was developed 
containing twenty-five activity items for each of the 
nine areas of the Kuder Preference Record. A five- 
step scale was to be used by the subjects in response 
to each item of the inventory. The five steps were 
arbitrarily assigned weights varying from 0 through 
4, with O representing no experience and 4 the 
weighting for maximum experience. 

A group of counselor trainees made constructive 
criticisms with regard to definitions of response cate- 
gories, statements, and appropriateness of activity 
items, administrability, vocabulary, and on the an- 
swer sheet form. It was administered to three sec- 
tions of general psychology and to students in a 


2“Experience patterns as related to vocational 
preference” (to appear, Educ. psychol. Measmt, 
1956). 


169 


170 


measurements and evaluation class in a California 
college, and finally, to a number of tenth-grade stu- 
dents in a California high school to get further 
evaluation of the inventory relative to the criteria 
being used for its construction. 


Statistical Analysis 


For statistical analysis of the Activity Experience 
Inventory and for tentative standardization, data 
were collected from students in three California high 
schools. The age and sex composition of the stand- 
ardization group is given in Table 1. Statistical 
analysis of the inventory included examination for 
validity, reliability and intercorrelation and the es- 
tablishment of normative data. 


Validity 

The following are some considerations that directly 
reflect on the validity of any attempt to measure ac- 
tivity experience. 

1. Activity items selected for an inventory at best 
represent an attempt to sample the experience back- 
ground of the individual being measured. 

2. The tendency of a subject to underestimate or 
overestimate, whether conscious or unconscious, is a 
general criticism of rating-scale techniques and will 
be reflected in the validity of this inventory. 

3. The psychological factor of recency might be 
expected to influence the amount of experience indi- 
cated in a particular activity and thus the validity 
of the measure. 

4. The amount of experience in an activity is not 
a direct function of time spent in the activity, but 
would seem to be dependent upon attentiveness, in- 
telligence, related experience, and possibly other 
factors. 

Recognizing these problems, the inventory was ex- 
amined for content and construct validity. Content 
validity was tentatively established by graduate 
counselor training students in the developmental 


Table 1 


Age and Sex Composition of the Sample Used in 
Standardizing the Activity Inventory 


Age Male Female _ Total 
15 5 14 19 
16 90 114 204 
17 211 232 443 
18 78 73 151 
19 12 5 17 

20 2 0 2 

Total 398 438 836 
Mean 17.02 16.87 16.98 


Wm. Price Ewens 


stage as they judged the appropriateness of items 
for measuring experience in the several interest areas. 
Criterion groups were not available for determining 
predictive and concurrent validity, but two analyses 
were made to examine the inventory for construct 
validity. This, essentially, is an attempt to validate 
the theory underlying the inventory. 

In the absence of other instruments giving a 
quantitative measure of experience, an Experience 
Data Blank was developed on which, by student re- 
sponse and by survey of school records, data were 
accumulated in each of the following areas: 


1, employment (described to indicate nature of 
job), 

2. hobby and leisure-time activities, 

3. out-of-class school activities, 

4. home duties and unpaid work experience, 

5. courses taken in school. 


Experience data collected from the data blank 
were examined and each item listed by the student 
under the above categories was evaluated by the 
writer relative to the interest areas of the Activity 
Experience Inventory. If a particular experience 
listed by the student related to more than one of 
the interest areas, it was credited accordingly. If an 
activity could not logically be classified in any of 
the interest areas, it was omitted from the summa- 
rization. As a next step in analyzing the data, the 
blanks were sorted into stacks representing varying 
amounts of experience for an interest area and as- 
signed rank numbers representing the variation from 
a large amount to a small amount of experience, 
This process of sorting and ranking was repeated for 
each of the interest areas. The rankings of experi- 
ence from the Experience Data Blanks were cor- 
related with scores from the Activity Experience In- 
ventory as indicated in Table 2. 


Table 2 


Correlation of Activity Experience Inventory Scores for 
High School Students Against an Independent 
Measure of Experience 


Validity 
Coefficients 

Interest Areas (N=76) 
Mechanical 82 
Computational 39 
Scientific 33 
Persuasive 45 
Artistic 37 
Literary 39 
Musical 58 
Social Service 27 
Clerical „52 
Median 39 


—— ee 


Preliminary Form of an Activity Experience Inventory 


171 


Table 3 


Means and Standard Deviations for Male and Female Experience Scores; 
Critical Ratios for Significance of Differences in Means 


Male Female 
(N=398) (N=438) 
CR of 
Interest Areas Mean SD Mean SD Means 
_ Mechanical 45.60 17.70 16.35 9.88 29,08* 
~# Computational 28.62 13.66 23.12 10.87 6.39* 
Scientific 29.86 14,20 21.13 11.30 9.84* 
Persuasive 24.85 14.82 23.88 14.90 0.94* 
Artistic 21.02 12.55 29.80 14.12 9.51 
Literary 21.06 12.15 28.72 14.93 8.16 
Musical 22.66 18.88 29.00 18.50 4.89 
Social Service 27.45 13.28 36.13 15.09 8.84 
3 Clerical 23.46 13.60 30.53 15.50 7.01 


*# Mean score for male greater than mean score for female, 


a) 


Results 


Examination of Table 2 shows validity co- 
efficients varying from .27 for the social serv- 
ice area to .82 for mechanical experience with 
a median coefficient of .39. A study of the 
data on the experience blanks partially ex- 
plains this range of coefficients. Work ex- 
perience and hobbies of a mechanical nature 
listed by the students on the blanks were 
quite objective and varied. The musical and 
clerical areas also yielded a listing of experi- 

ence information that was relatively easy to 
classify. These areas show high validity co- 
efficients. The definiteness of mechanical, 
< musical, and clerical experiences can be con- 
trasted with the indefiniteness of the social 
īvice area. High school courses and other 
periences are less easily categorized as so- 

service in nature, resulting in a lower 
idity coefficient. 
Additional validity evidence is found in 
examination of mean experience scores 
r boys and girls given in Table 3. Mean 
‘perience scores for males were found to be 
ificantly greater than scores for females 
“the mechanical, computational, and scien- 
areas, with mean experience scores in 
rsuasive activities not being signifi- 
tly different. Females had significantly 
er mean experience scores than males in 
artistic, literary, musical, social service, 
clerical areas. 


The above data showing the relationship 
between experience scores obtained from the 
Activity Experience Inventory and the 
amount of experience obtained by an inde- 
pendent and quite different technique, the 
Experience Data Blank, and the mean ex- 
perience scores for males and females are 
suggestive of construct validity of the in- 
strument. 


Reliability 

Reliability coefficients, when correlating 
odd vs. even items of the inventory (Table 
4), for a sample of 398 junior and senior 
high school males varied from 87 to 94 
with a mean of .90. Five of the interest 
areas—mechanical, computational, persuasive, 
musical, and social service—gave coefficients 
equal to or greater than .90. For a sample 
of 438 junior and senior high school females, 
similar coefficients varied from .82 to .92, 
with a mean reliability coefficient of .89. 
The persuasive, artistic, musical, and social 
service areas gave coefficients greater than 
90. 

Of the seniors who marked the activity in- 
ventory, 31 males and 35 females were avail- 
able for again marking the inventory in a six- 
month follow-up. The interval of six months 
between the first and second administration 
of the inventory spanned a summer, and the 
subjects had enrolled in college near the end 


172 


of this period. Test-retest reliability coeffi- 
cients from Table 4 ranged for males from 
.75 to .91, with a mean of .83, and for fe- 
males from .60 to .79, with a mean of .73. 

To examine the inventory for stability of 
profiles, experience scores for 31 males and 
35 females were converted to standard scores, 
ranked in order of these scores, and rho co- 
efficients of the paired rankings determined. 
These coefficients represent the degree of simi- 
larity of experience profiles resulting from 
two administrations of the inventory with a 
time interval of six months. The rho coeffi- 
cients for males ranged from .30 to .98, with 
a median of .82. For the 35 females the rho 
coefficients ranged from .45 to .93, with a 
median of .77. 


Intercorrelations 


Intercorrelations calculated from a 20 per 
cent sample of the original data are given in 
Table 5. The coefficients for the areas of 
the inventory ranged for males from .09 to 
.81, with a median coefficient of .54. For fe- 
males the coefficients ranged from .29 to .75, 
with a median of .53. The correlations be- 
tween areas are approximately the same for 
males and females with the exception of three 
coefficients. The musical area, when corre- 
lated with the mechanical, computational, 


Wm. Price Ewens 


and scientific areas, gave somewhat larger 
coefficients for females than for males. 

These intercorrelations seem large for effi- 
cient testing but may be justified for an in- 
strument of this nature. Since the experi- 
ences of youth tend to encompass all the in- 
terest categories, one would expect positive 
intercorrelations for areas of the inventory. 

Correlations between areas of the Activity 
Experience Inventory are considerably greater 
than for areas of the Kuder Preference Rec- 
ord (3) for which item selection and instru- 
hent design assured low intercorrelations. 
The intercorrelation coefficients for the areas 
of the inventory are about the same magni- 
tude as the coefficients between areas of the 
Strong Vocational Interest Blank (5). 


Normative Data 


The age-grade characteristics of the stand- 
ardization group and the mean experience 
scores and standard deviations for males and 
females are given in Tables 1 and 3, respec- 
tively. The male and female mean experi- 
ence scores were sufficiently different to make 
separate norms necessary. Percentile norms 
developed for each interest area were ar- 
ranged in table form to facilitate construc- 
tion of individual profile charts from raw 
scores. 


Table 4 
Reliability Coefficients Obtained by Correlating Odd vs. Even Items and from Test-Retest Scores 


Odd vs. Even Items (rn) 


Test-retest (rt) 


Male Femal 

Interest Areas (v=398)  (=438) ae 3D ea 
Mechanical 94 87 84 72 
Computational 90 82 85 .70 
Scientific 88 86 83 79 
Persuasive 92 92 89 -76 
Artistic 89 92 -18 -69 
Literary 89 88 82 «17 
Musical 91 91 91 72 
Social Service 91 .92 15 .79 
Clerical 87 83 76 60 

Mean -90 89 83 73 


Preliminary Form of an Activity Experience Inventory 


173 


Table 5 


Intercorrelations for Areas of the Activity Experience Inventory for Both Male and Female 
N (Male) = 79; N (Female) = 89 


Interest Areas 1 2 3 4 5 ó 7 8 
1. Mechanical 
2. Computational 48 
.48* s 
3. Scientific .50 .53 i 
52 -60 
Ead 
4. Persuasive .29 -70 Al 
29 64 @& .67 
5. Artistic Al 61 42 61 
55 -54 60 54 
6. Literary 30 63 61 61 65 
A8 55 62 68 56 
7. Musical d4}  .24ł ot SI 35 43 
44 44 60 60 52 Al 
8. Social Service $ AT 68 53 73 61 -68 46 
53 51 57 58 61 60 50 
9. Clerical 38 81 54 .67 57 65 21t 65 
44 .75 Al 57 52 AT 37 62 


* Female intercorrelations are in boldface type, 
+ Notfsignificant to the 1% level. 


1 Summary 


1. This report describes the development of 

- the Activity Experience Inventory, a measure 

j of manifest interest, using the interest areas 

of the Kuder Preference Record, Form BB, 
as framework for the inventory. 

2. The experience inventory was designed 
to be objectively scored, to give a composite 
score of experience for each of the interest 
areas, to contain activity items within the 
probable experience of boys and girls of high 
school age, to use vocabulary easily under- 
stood by high school age groups, to contain 
directions to permit self-administration and 
scoring, and to be administered and scored 
in about fifty minutes. 

3. Using the Activity Experience Inventory 
and the Experience Data Blank, validity co- 
efficients for the inventory varied from .27 to 
.82, with a median of .39. A study of mean 
experience patterns for boys and girls gave 
further evidence of validity. Males had sig- 


nificantly greater mean experience scores than 
females in the mechanical, computational, and 
scientific areas. Females had significantly 
greater mean experience scores in the artistic, 
literary, musical, social service, and clerical 
areas. There was no significant difference in 
mean persuasive scores. 

4. Odd-even reliability coefficients for the 
Activity Experience Inventory varied for 
males from .87 to .94, with a mean of .90, 
and for females from .82 to .92, with a mean 
of .89. Test-retest reliability coefficients with 
a six-month time interval varied for males 
from .75 to .91, with a mean of .83, and for 
females from .60 to .79, with a mean of .73. 

Rho coefficients were calculated from test- 
retest experience profiles as an indication of 
profile stability. With a time. lapse of six 
months between the two administrations of 
the inventory, the coefficient# ranged for 
males from .30 to .98, with a median of .82. 


174 


Rho coefficients for females ranged from .45 
to .93, with a median of .77. 

5. Intercorrelations for areas of the inven- 
tory ranged for males from .09 to .81, with 
a median coefficient of .54. For females the 
coefficients ranged from .29 to -75, with a 
median of .53. 

6. Tentative norms, separate for males and 
females, were established using a sample of 
398 males and 438 females who were juniors 
and seniors in three California high schools. 


Received August 26, 1955. 


References 


1. Dressel, P. L., & Matteson, R. W. The relation- 
ship between experience and interest as meas- 
ured by the Kuder Preference Record, Educ. 
psychol. Measmt, 1952, 12, 109-116, 


Wm. Price Ewens 


2. 


4, 
S 


9. 


Fryer, D. The measurement of interests in rela- 
tion to human adjustment. New York: Holt, 
1931. Pp. 1-363. 


- Kuder, G. F. Revised- Manual for the Kuder 


Preference Record. Chicago: Science Re- 
search Associates, 1946. Pp. 3-30. 

Older, H. J. An objective test of vocational in- 
terest. J. appl. Psychol., 1944, 28, 99-108. 
Strong, E. K. Vocational interests of men and 
women. Stanford, Calif.: Stanford Univer, 

Press, 1943. Pp. 1-726. 


. Super, D, E. Appraising vocational fitness. New 


York: Harper, 1949, Pp. 1-642. 


. Super, D. E, & Haddad, W. C. The effect of 


familiarity with an occupational field on a 
recognition test of vocational interest, J.. 
educ. Psychol., 1943, 34, 103-109. 


. Super, D. E., & Roper, Sylvia A. An objective 


technique for testing vocational interests, Ji 
appl. Psychol., 1941, 25, 487-498. 

Travers, R. M. W. Educational measurement. 
New York: Macmillan, 1955. „Pp. 3-407. 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


f, The Gordon Personal Profile (1) is a per- 
sonality test developed by the forced-choice 
technique. While it is not claimed that the 
forced-choice approach renders the Profile 

i “fakeproof,” it is stated that “. . . it prob- 

f «ably is less subject to ‘faking’ than inventory- 
type instruments” (1, p. 10). The present 
study was undertaken to test this statement. 

Longstaff and Jurgensen (2) administered 

k the Jurgensen Classification Inventory, a 

forced-choice -instrument, to 68 students un- 
der: two sets of directions. The first was to 
represent an industrial selection situation, 
and the second a vocational guidance situa- 
tion. Both administrations were scored on a 
Self-Confidence key, and the differences in 
mean scores between the two administrations 
were found not to be statistically significant. 
However, the correlation coefficient between 
scores in the two administrations was .50, 
which the authors interpret as being “not en- 
couraging.” 

Their interpretation of their data brings 

“them to the position that other techniques 

must be devised if the problem of malinger- 

‘ing on personality tests is to be overcome. 

While the present writer would concur with 
a position, that no instrument is likely to be 
proof against malingering, his experience with 
the forced-choice test has led him to the po- 
sition that the prospects for the forced-choice 
_ personality test are somewhat brighter than 
has been previously stated. 
In the present study, to determine the fak- 
ability of the Gordon Personal Profile, the 

f experimental situation developed by Long- 

' staff and Jurgensen (2, p. 88) was used. 

While the Classification Inventory and the 
Personal Profile both employ forced-choice 
format, they differ in the manner in which 
they were developed. Items were included in 
the Personal ,Profile on the basis of differ- 
ential discriminating ability as determined by 
factorial composition, while no such selection 
was made for the Classification Inventory 


lo 


Fakability of the Gordon Personal Profile 


Jay T. Rusmore 
San José State College 


(2). Furthermore, pairs of items in the Per- 
sonal Profile were prematched on the basis of 
both equality of preference value and differ- 
ential discriminating ability, while items in 
the Classification Inventory were equated 
only on preference value. Finally, all items 
in the Personal Profile are scored on estab- 
lished keys, while the Classification Inven- 
tory has keys developed empirically for par- 
ticular situations utilizing only some of the 
items. In view of these differences, it is con- 
ceivable that the two tests’ may differ in 
terms of fakability. 


Procedure 


A group of 81 lower division students were given 
the Gordon Personal Profile twice, each time with 
different instructions, 

The first administration was in a simulated indus- 
trial situation. Directions were: “In taking this test 
make the following assumptions, You have just fin- 
ished your college work and are in the employment. 
department of the organization you hope to work 
for, applying for a job. This job you are applying 
for is exactly the kind of job you want so it is very 
important to you that you get it. The personnel 
manager informs you that the company has a bat- 
tery of tests they give all their applicants and says, 
‘This is the first test in the battery. It is called the 
Gordon Personal Profile. You will please read the 
directions and then answer the questions.’ ” 

The second administration was in a simulated 
guidance situation. Directions were: “At the last 
meeting of the class you took the Gordon Personal 
Profile assuming you were applying for a job, To- 
day I would like to have you take the test again, 
making the following assumptions: You are having 
a great deal of trouble trying to decide what voca- 
tion you should go into. You finally decide to go 
to the Student Counseling Bureau to see if they can 
give you any assistance. The counselor informs you, 
‘We have a battery of tests we should like to have 
you take. We have found the results very helpful 
in dealing with problems like your own. The first 
test in the battery is called the Gordon Personal 
Profile. Will you please read the directions and 
then answer the questions?’ ” 


Results 


The scales of the Gordon Personal Profile 
are: Ascendancy (A), Responsibility (R), 


175 


176 Jay T. 


Rusmore 


Table 1 
Mean Raw Scores and Standard Deviations on Each Scale of the Gordon Personal Profile, 


Administered Under Two 


Different Sets of Instructions 


(N = 81 San José State College Psychology Students) 


Scales 
A R E S T 

Situation, Accordin; 

PERT j Mean SD Mean SD Mean SD Mean SD Mean SD 
Industrial 44 55 89 5.1 8.0 5.5 48 6.0 26.1 13.9 
Vocational 3.7 6.2 74 5.8 T3 6.7 44 68 23.0 16.6 
Difference -7 1.5 7 4 3.1 
Significance of difference (t) 14 PAY fod 1.3 a!) 2.0* 


* Significant at the 5% level of confidence. 
** Significant at the 1% level of confidence, 


Emotional Stability (E), Sociability (S), and 
Total, or over-all self-evaluation (T). Mean 
scores for each scale for both simulated situa- 
tions are given in Table 1, 

It may be noted that for each scale the dif- 
ference is in favor of the “better” score for 
the simulated industrial situation. The £ test 
for the significance of these differences indi- 
cates that only in the case of R and T are 
these differences significant; R at the 1 per 
cent level of confidence, T at the 5 per cent 
level. 

The Profile was equally reliable under both 
sets of directions. These reliability coeffi- 
cients for “industrial” and “vocational” situa- 
tions, respectively, by scales are: A, .87 and 
89; R, .86 and .88; E, .81 and 70; S, .87 
and .92; T, .93 and ,94, 


Table 2 


Relation Between Raw Scores Made on Each Scale of 
The Gordon Personal Profile, Administered Under 
Two Different Sets of Instructions 


(N = 81 San José"State College Psychology Students) 


Scales 
Lae PIAS aE a 
WRR? Bibi: Saou 


(Industrial) (Vocational) .70 64 .67 .79 50 


(Industrial) (Vocational) .79 .73 .g9 
Corrected for attenuation 
in both variables 


$a 


Correlations between each of the scales as 
administered in the two situations are given 
in Table 2. Total score correlational infor- 
mation is also presented. 

It will be seen that the correlation between 
scores for each trait in each of the two situa- 
tions is substantial. For those who may be 
interested, the coefficients are also corrected 
to show the theoretically true relationship less 
certain attenuation due to unreliability of the 
measures. 

It will be seen that the correlations be- 
tween the scores for each trait in the two 
situations range from .64 to 79; the value 
for the Total score is .59, This is a depar- 
ture from what may ordinarily be expected 
from a summary score. In a personal com- 
munication, the test author proposes that, for 
the present data, unit increases in score on 
any trait are smaller, in standard score terms, 
than the sum of these units, in standard score 
terms, for the Total score. Under this con- 
dition, the test-retest correlations may be less 
disturbed by introduced variance for the 
traits than for the Total. An empirical test 
of this has been undertaken by the test au- 
thor. His experience is reported in the fol- 
lowing Paragraph: 


I tried this out on my own data where, with 121 
subjects, I obtained test-retest correlations of .798, 
678, .739 and .782 for ARES and .793 for T.... 
I added an additional 


* ; Fakability of the Gordon Personal Profile 


for Responsibility, which was the lowest, dropped to 
.620 while that for the Total dropped to .602, which 
is lower than the lowest trait score.t 


Conclusions 


1. In general, individuals have a slight 
tendency to show themselves to better advan- 
tage in the simulated industrial selection situa- 
tion than in the simulated vocational guidance 
situation. The difference between the means 
of the Total score for the two administrations 
is statistically significant at the 5 per cent 
level, but this difference is not of great prac- 
tical significance, being equivalent to an in- 
crease of about 8 percentile points. 

2. Of the four scales in the test, Responsi- 
bility shows a significant difference in favor 
of the simulated industrial selection adminis- 
tration at the 1 per cent level of confidence. 
This mean increase is equivalent to about 
9 percentile points. The difference between 

~ mean scores on the two administrations for 


+L. V. Gordon, Personal communication. June 14, 
1955. 


177 


Ascendancy, Emotional Stability, and Socia- 
bility scales is not statistically significant. 

3. The correlation coefficients for the four 
traits between the scores on the simulated in- 
dustrial selection and vocational guidance ad- 
ministrations are substantial. This indicates 
that the subjects did not change their re- 
sponses substantially from one set of direc- 
tions to the other. 

4. Present results support the contention 
that the Gordon Personal Profile “. . . prob- 
ably is less subject to ‘faking’ than inventory- 
type instruments.” The results of the pres- 
ent study contrast with those reported for 
nonforced-choice type instruments in resist- 
ance to being faked. 


Received August 8, 1955. 


References 


1. Gordon, L. V. Manual, Gordon Personal Profile. 
Yonkers, N. Y.: World Book, 1953. 

2. Longstaff, H. P., & Jurgensen, C. E. Fakability 
of the Jurgensen Classification Inventory. J. 
appl. Psychol., 1953, 37, 86-89. 


| 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


Evaluation of Angular Digits and Comparisons with a 
Conventional Set * 


P. J. Foley 


Defence Research Medical Laboratories, Toronto 


Lansdell (4) has studied the legibility of 
two sets of conventional type digits, those of 
Mackworth (1) and Mound (4), and a set 
of digits designed to make maximum use of 
easily discriminated forms. When compared 
under poor viewing conditions which gave 
51.5 per cent correct identification of the 
conventional sets, the new digits were cor- 
rectly identified 67.4 per cent of the time. 
The design of the new set of digits is such 
that they are recognized as numbers with as 
little as three presentations. 

Lansdell has since made some revisions. 
This is a report upon four experiments car- 
ried out with the revised set to answer the 
following questions: 


1. What are the confusion errors? 

2. Is the legibility of these digits independ- 
ent of whether they are presented as black 
figures on a white ground, or as white figures 
on a black ground? 

3. Is this set more legible than a typical 
conventional. set under varied conditions of 
exposure and illumination? 

4. Is this set more legible than a typical 
conventional set when the digits are viewed 
obliquely? 

The conventional set chosen for compari- 
son was that of Mackworth (1). This set 
was decided upon because it shows consist- 
ently high performance in comparisons made 
by other investigators (3, 4, 6, 7). 


Method 


General procedure. The procedure common to all 
experiments is as follows: 

The digits were presented singly to the subjects 
(Ss) who viewed them at a distance of 20 feet. The 
exposure time, rate of presentation, and illumination 
level were controlled by the experimenter, 

The Ss sat within a boxlike structure, so that the 
field of view was restricted to the screen and its im- 


1 Defence Research Medical Laboratories Report 
No. 76-2, Project No. D77-94-20-21, (H.R. No. 
117). 


mediate surround. The room itself was dark. Re- 
sponses were noted by an assistant seated just out- 
side the box. 

Apparatus. A magazine-load automatic slide pro- 
jector was used. This was modified so that the 
shutter remained open as the slides were changing, 
ensuring continuous illumination on the screen inde- 
pendently of whether or not a digit was present. 
Each digit thus appeared first as a blur on the screen, 
was brought sharply into focus for the period of ex- 
posure, then disappeared, giving place to the blur of 
the succeeding digit. Preliminary experiments showed 
that this blur had no effect on subsequent legibility. 

The projector was connected to two interval timers 
such that the exposure time and the interval be- 
tween exposures were independently controlled. 

Illumination on the screen was varied by placing 
neutral density filters in front of the projector lens, 
The required values were determined empirically by 
measuring the resultant illumination on the screen, 
These measurements were made using the Macbeth 
Illuminometer. 

The screen, 3 X 3 in, was made of white Bristol 
board and was mounted so that it could be made to 
rotate about its vertical axis. 

The digits used are shown in Fig. 1. Specifica- 
tions were as follows. 


1. Mackworth: height/width ratio was 2:1. Stroke 
width was constant and equal to 12.5 per cent of 
height. 

2. Revised Lansdell: vertical and horizontal tan- 
gents of each digit, with the exception of the digit 
“1,” form a rectangle with a height/width ratio of 
2:1. The digit “1” has a height/width ratio of 
13.3:1.2 

All digits were mounted singly on 2 X 2-in, glass 
slides, and when projected on the screen were 3% in. 
high. 

Subjects. The Ss’ ages ranged from 18 to 37 years. 
All had 20/20 or better binocular acuity at a dis- 
tance, as tested on the U. S. Armed Forces Vision 
Tester. Before every experimental session Ss were 
shown each digit three times, under the conditions 


2A drawing of each of these digits and results of 
the statistical analyses for this and succeeding ex- 
periments have been deposited with the American 
Documentation Institute. Order Document No. 4835 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 


178 


Evaluation of Angular Digits 


Erer oea 


se vie 


Fic. 1. 


obtaining in that session, and were told the identity 
of the digit before each presentation. 


Experiment 1: Confusion Errors of Revised 
Lansdell Digits 


Digits were exposed singly for .6 second with a 
presentation rate of one digit every three seconds. 
Each S was given 300 presentations, 30 for each 
digit. The order was random. The illumination 
level on the screen was ten foot-candles. There were 
15 Ss, who had no previous experience with the 
Lansdell digits. 


Results? The specific confusions which 
contributed more than 5 per cent to the total 
error were the 3 with 5 (6.12%), the 3 with 
7 (6.0%), the 5 with the 3 (6.6%), the 9 
with the 5 (5.2%), and the O with the 8 


(8.9%). 


Experiment 2: Revised Lansdell Black on 
White vs. White on Black 


Revised Lansdell digits, black on white, were com- 
pared with revised Lansdell digits, white on black, 
at three illumination levels, 10, 30, and 50 foot- 
candles. Digits were exposed singly for .5 second, 
with a presentation rate of one digit every three 
econds, The design used was a 3 X 2 factorial, giv- 
‘ing six conditions per S. Each S was presented with 
each digit three times under each condition, giving a 
total of 30 presentations per S, per condition. Digits 
were presented randomly to each S under each con- 
dition, and the conditions were also presented ran- 
domly. The Ss were given five minutes preadapta- 
tion for each illumination level. There were 10 Ss 
drawn from the 15 used in Experiment 1. 


Results. Analysis of variance shows that 
differences between Ss are significant (P < 
01). None of the subject interactions are 
significant, however, indicating that the re- 
sponses of all Ss are in the same direction over 
all conditions. The interaction between digit 


` 8See footnote 2. 


179 


b 7 a 1- 


b T Bae 


Lansdell digits and Mackworth digits. 


type and illumination level is significant (P 
< .01), showing that the legibility of the 
revised Lansdell digits is not independent of 
whether they are presented as black figures on 
a white ground or as white figures on a black 
ground. If the illumination level is of the 
order of 10 foot-candles, then white digits on 
a black ground are more legible; if the illumi- 
nation level is from 30 to 50 foot-candles, 
then black digits on a white ground are more 
legible. 


Experiment 3: Revised Lansdell Digits vs. 
Mackworth Digits 


Revised Lansdell digits, black on white, were com- 
pared with Mackworth digits, black on white at 
three illumination levels, 10, 30, and 50 foot-candles, 
and at three exposure times, .3, .8, and 1.3 seconds. 
Digits were exposed singly, with a presentation rate 
of one digit every three seconds. The design was a 
3X3X2 analysis of variance giving 18 conditions 
per S. Each S was presented with each digit three 
times during each condition, giving a total of 30 
presentations per S per condition, Digits were pre- 
sented randomly to each S under each condition. 
In order to decrease the length of each session by 
avoiding pre-adaptation periods between conditions, 
and since the effect of illumination level on both 
digit types was known, a split-plot design was used 
(2). This design confounded illumination with ses- 
sions, but did not affect the main comparison of ex- 
posure time with digit type. Six Ss from Experi- 
ment 2 were used, One of the six possible orders of 
the three illumination levels was allotted to each S, 
The conditions of exposure with digit type were 
randomized within each session. 


Results. Analysis of variance shows that 
the difference between the legibility of the re- 
vised Lansdell digits and the Mackworth 
digits is highly significant (P < .01). 

The interaction between exposure and digit 
type is not significant. There is a highly sig- 


180 


nificant increase in percentage correct from 
Exposure 1 to Exposure 3 (P < .01). There 
is no evidence of departure from linearity 
when Exposure 1 and Exposure 3 are com- 
pared with Exposure 2. 

Similarly for illumination levels—as illumi- 
nation increases from 10 to 50 foot-candles— 
there is a highly significant increase in per- 
centage correct (P < .01). There is no evi- 
dence of departure from linearity. 


Experiment 4: Revised Lansdell Digits vs. 
Mackworth Digits at Different Angles of 
View 

Revised Lansdell digits, black on white, were com- 
pared with Mackworth digits, black on white, at 
three angles of view, 45° left, normal, and 45° right. 

Digits were exposed singly for .8 second, with a 

presentation rate of one digit every three seconds. 

The illumination on the screen was 30 foot-candles. 

The design was a 3X 2 analysis of variance giving 

six conditions per S. Each S was presented with 

each digit three times during each condition, giving 

a total of 30 presentations per S, per condition. 

Digits were presented randomly, as were conditions, 

‘There were five Ss, all of whom had been used in the 

previous experiments. 


Results, The revised Lansdell digits are 
significantly more legible than the Mack- 
worth digits under these conditions (P< 
01). There is no interaction between digit 
type and viewing angle. The degrees of free- 
dom for angles of view were broken up and 
the following comparisons made: (a) 45° R 
— 45° L; the difference is not significant, in- 
dicating that it does not matter from which 
side the digits are viewed. (b) 45° R + 45° 
L — 2 (normal); the difference is significant, 
and shows a decrease in legibility between 


P. J. Foley 


the normal and the oblique angle of view. 
None of the interactions approach signifi- 
cance. 

Summary and Conclusions 


A new set of digits designed to make maxi- 
mum use of easily discriminated forms was 
studied. Data on confusion errors are given. 
The legibility of the new digits is not inde- 
pendent of whether they are presented as 
black on a white ground or as white on a 
black ground. At low illumination levels 
white on black is more legible, the reverse 
being true at high illumination levels. Com- 
parisons with a conventional set, the Mack- 
worth digits, at different illumination levels, 
exposure times, and angles of view, show the 
new set to be significantly more legible under 
all of these conditions. 


Received September 19, 1955. 


References 


1. Bartlett, F., & Mackworth, N. H. Planned seeing, 
London: H. M. Stationery Office, 1950. 

2. Cochran, W. G., & Cox, G. M. Experimental de- 
signs. New York: Wiley, 1950. Ch, 7. 

3. Crook, M. M., & Baxter, F. S. The design of 
digits. USAF, WADC Tech. Rep., 1954, No. 
54-262. 

4. Lansdell, H. The effect of form on the legibility 
of numbers. DRML Report No. 76-1, Canad. 
J. Psychol., 1954, 8, 77-79. 

5. Quenouille, M. H. Introductory statistics. Lon- 
don: Butterworth-Spring, Ltd., 1950. 

6. Reinwald, F.L. Design of visual displays. Hamil- 
ton, N. Y.: Colgate University, 1953. (Dept. 
of Psychol, Prog. Rep. No. 3, Contract AF 
30 (602)-212.) 

. Schapiro, H. B. Factors affecting legibility of 
digits. USAF, WADC Tech. Rep, 1952, No. 
52-127. 


~ 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


Dimensional Analysis Of Motion: IX. Comparison of 
Visual and Nonvisual Control of Component 
Movements * 


Janet Huiskamp, Robert C. Smader, and K. U. Smith 


University of Wisconsin 


This is a study of perception and human 
motion. In the investigation an attempt is 
made to determine the role of perceptual fac- 
tors in the determination of the component 
movements making up a skilled pattern of 
motion. 

The study to be described here deals with 
a comparison of visual and nonvisual control 
of the component movements of manipulation 
and travel in a panel control task. The re- 
sults of the study, as it has been planned, 
have a bearing on a number of problems. 
The data obtained on skilled performance 
provide information on the nature of blind 
movements. Results on learning under the 
different perceptual conditions suggest cer- 
tain theoretical and practical relations be- 
tween perception and motion. In addition, 
data on transfer effects are interpreted in re- 
lation to the problems of the organization of 
motion in work. 


Method 
Apparatus 


Electronic methods of motion analysis are used in 
this experiment to measure precisely the duration of 
component movements involved in a panel control 
task. These methods have been described in detail 
before (1, 2) and will be discussed here only in a 
general way. 

The apparatus is presented schematically in Fig. 1. 
There are five rows of switches or knobs on the 
work panel, each of which turns only to the right 
and through an arc of about 45 degrees. The dis- 
tance between all adjacent knobs, horizontally and 
vertically, is the same. Only four rows of knobs 
with the four center switches in each row were used 
and the remainder were masked. Each knob on the 
panel was comfortably within reach of the S, whose 
task was to turn each of the 16 knobs one after the 
other as quickly as possible. 

The internal housing of the apparatus is behind 
the panel and out of S’s view. It consists of an 
electronic relay which is on a current level of sub- 


1 This research was supported by funds provided 
by the National Science Foundation for the project 
“Perception and Human Motion.” 


threshold value for the human skin. Connections 
are made between the human operator and this relay 
by means of an electrode held in $’s left hand. Con- 
nections are also made between the relay and the 
knobs to be manipulated so that when S comes in 
contact with any knob, the circuit is completed. 

The two clocks used in this apparatus record ma- 
nipulation time and travel time separately. The ma- 
nipulation clock starts recording as soon as the first 
knob in a pattern is contacted or manipulated, and 
stops at the moment S releases that knob. As soon 
as this first knob is released, the travel clock starts 
and continues to run until the next knob is grasped. 
Then the travel clock stops and the manipulation 
clock starts again, The two different movement 
times are accumulated separately on the two clocks. 
All recording starts with the manipulation of the 
first knob contacted, which can be anywhere on the 
board, and ends with the manipulation of the final 
knob, which can be located as desired by means of 
an “ending” plug. This terminal knob stops both 
clocks at the end of the trial. 


Experimental Design and Procedure 


The general design of this experiment is indicated 
in Table 1. Twenty-four university students were 
divided into two matched groups on the basis of pre- 
test scores. The groups were treated identically in 
all experimental conditions except that one group 
performed the task visually and the other group 
performed it blindfolded. These two groups will be 
referred to hereafter as the visual and the blind 
groups. The design covers both practice and trans-- 
fer effects. As shown in Table 1, Ss were first given 
a pretest on both conditions, then had 10 days of 
practice in either the visual or blind condition. On 
the twelfth day, transfer trials were run. 

Each S performed the knob-turning task in four 
different directions, The directions used are, A, from 
left to right, B, from right to left, C, from bottom 
to top, and D, from top to bottom, The effects of 
sequence of presentation of these different directions 
of movement were controlled by using a replicated 
latin-square design. 

Before starting the experiment, each S was pre- 
sented with a schematic diagram of the arrangement 
of the knobs. The four directions of movement were 
illustrated and the nature of the task was explained. 
Each S was told that he would be blindfolded dur- 
ing his first experience with the apparatus. 

Without having seen the apparatus, each S per- 


181 


182 


ING F = 
ne far Fd 


Janet Huiskamp, Robert C. Smader, and K. U. Smith 


INTERNAL | 


pe | ELECTRONIC | 

| ‘RELAYS | 

+ | 

SIDE l : 

o |PANEL| 4 i 

Gg 

SA 

UNIVERSAL CONTROL TERMINAL MANIPULATION TRAVEL 
PANEL BOARD TIME TIME 


Fic. 1. Diagram of the preplanned work panel and the electronic motion analyzer. 


“formed blindfolded 12 trials, three trials in each di- 

rection of movement. All Ss performed an ABCD 
sequence, Before the first trial of a new direction 
of movement, the experimenter took S's hand and 
traced, by touching the knobs, the directional pat- 
tern to be followed. When the blind performance 
was completed, the blindfold was removed and S re- 
peated his performance visually. By means of this 
pretest, Ss were matched and one of each pair was 
assigned to the blind condition, the other serving in 
the visual condition, 

On Day 2, each S performed a particular sequence 
of directional trials assigned to him by the replicated 
latin-square design. From Day 2 to Day 11, each 
S received practice under his perceptual condition, 
Practice consisted of three trials in each of the four 
directional patterns, The median score for each di- 
rection was taken as his score for the day. 

On the twelfth day, transfer tests were run, In 
these transfer tests, Ss who had practiced visually 


Table 1 
General Design 


Pretest—Day 1 


All Ss perform 12 blind and 12 visual trials, 
Used for matching. 


Learning—Day 2 through 11 
Group I-12 blind trials for 10 days 
Group II—12 visual trials for 10 days 

Transfer—Day 12 


Group I—12 visual trials 
Group II—12 blind trials 


now performed blindfolded, and those who had prac- 
ticed blind now performed visually, 


Results * 


The results of this experiment will be dis- 
cussed in terms of differences between the 
visual and blind groups in relation to the 
following: (a) skilled performance on the 
eleventh day of practice, (b) acquisition of 
skill as a function of practice, and (c) trans- 
fer effects. 


Skilled Performance 


The differences between the visual and 
blind groups on both the manipulative and 
travel components of the task are shown 
graphically in Fig. 2. In this bar graph, the 
data for the visual and blind groups are 
shown separately for the two different com- 
ponent movements. The mean duration of 
both movement components is significantly 
greater for the blind group. 

The results of the analyses of variance of 


2 The summaries of the analysis of variance of the 
different parts of the data, together with a summary 
of the critical data on which these analyses are based, 
are on file with the American Documentation Insti- 
tute. Order Document No. 4776 from ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington 25, D. C., remitting 
in advance $1.75 for microfilms or $2.50 for photo- 
copies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 


Dimensional Analysis of Motion: IX. 


Table 2 


Analyses of Variance for the Manipulative and Travel 
Times of Skilled Performance on Day 11 


Manipulation Time 


Mean 

Source df Square F 
Group il 117.3953 46,20** 
Direction 3 -2448 2.39 
Trials 3 1788 1.74 
Ss/Groups 22 2.5413 24.80** 
Group X direction 3 .2199 2.14 
Group X trial 3 .0501 
Error 60 1025 

Total 95 
Travel Time 

Group 1 12.4272 9.88** 
Direction 3 1.1097 12.89** 
Trials 3 0684 
Ss/Groups 22 1.2582 14.61** 
Group X direction 3 6248 7.26** 
Group X trials 3 .1198 1.39 
Error 60 .0861 


Total 95 


%* Significant beyond the 1% level. 


the data on which Fig. 2 is based are shown 
in Table 2. In addition to the differences in- 
dicated in Fig. 2, the analysis shows that the 
direction of movement significantly affected 


183 


SECONDS 
[C] visuo B tina 
4 
2 
(0) 
Manipulation Travel 


Fic. 2. Bar graph showing the difference between 
visual and blind conditions of manipulation and 
travel. 


the travel component of the task. The sig- » 
nificant group-by-direction interaction for the 
travel component indicates that the four di- 
rectional patterns were affected differently by 
the two perceptual conditions. 


Acquisition 

Figure 3 shows graphically the course of 
learning for the 11 days of practice. In this 
figure, the curves for the manipulative com- 
ponent are presented to the left, those for 


Seconds Seconds 
u K 
J 
9 Visual e—»—e—e 
Blind -————* 
7 7 
. 

5 N 

8 5 

.— 
a 6 0 gg: 
*— e—e—»9— e— # —e-—0— eoero 

: 3 
ie RES EL T o- T 3 5 7 E T 


1 3 


Manipulation 


Travel 


Fic. 3. Learning curves for manipulation and travel under the visual and blind conditions. 


184 


travel to the right. The two different curves 
for each component movement represent the 
perceptual conditions used. It can be ob- 
served that the difference between the visual 
and blind groups for both manipulative and 
travel components is greater on the first day 
than on any succeeding day. This difference 
is not only the result of the perceptual con- 
ditions, but also reflects the fact that on 
Day 1, the pretest day, the blind Ss had 
never had any contact with the apparatus 
while the visual Ss had had 12 blind trials 
preceding their visual trials, which appear on 
the graph. 

In analyzing the learning data, difference 
scores were obtained for each S$ by subtract- 
ing the scores made on Day 11 from the scores 
made on Day 2. Separate ¢ tests were run 
on the visual and blind groups for manipula- 
tion and travel scores to determine whether 

the difference between Day 2 and Day 11 was 
Significantly greater than zero. The results 
indicate that the visual and the blind groups’ 
performance on the manipulative component 
of the task was improved significantly by 
practice. The ¢ value was significant beyond 
the 1 per cent level, The blind group showed 
a learning effect in the travel component 
which was significant at the 5 per cent level, 
while the visual group showed no significant 
learning effect in this component. 


Transfer 


We have examined the problem of transfer 
of training in this study in order to determine 
whether the effects of visual and blind learn- 
ing haye different influences on performance 
on the nontrained task. By the electronic 
methods used, we are able to determine 


Table 3 


Mean Duration in Seconds for Skilled Performance 
and Transfer Performance 


ae Day11 Day 12 
Condition Component Training Transfer 
Visual Training— Manipulative 3.66 6.65 
Blind Transfer Travel 3.51 4.67 
Blind Training— Manipulative 5.87 3.87 
Visual Transfer Travel 4.12 3.49 


Janet Huiskamp, Robert C. Smader, and K. U. Smith 


whether these influences affect either the ma- 
nipulative or the travel component of the 
task, or both. 

Transfer of training has been considered in 
two ways in this study. First, in the tradi- 
tional manner, pretest and transfer test scores 
for the two groups were compared to see if 
the two perceptual conditions in training had 
a differential effect. That is to say, the visual 
pretest and the visual transfer test for the 
blind-training group were compared with the 
blind pretest and the blind transfer test for 
the visual-training group. Second, the scores 
on the last day of training and the scores on 
the transfer-test day have been compared to 
see if the change in perceptual conditions was 
reflected in a performance change. 

An analysis of covariance was used to com- 
pare the pretest and transfer-test scores for 
the two groups. A simple analysis of variance 
showed that there was a statistically signifi- 
cant difference between the two groups on the 
pretest day, indicating that the group per- 
forming visually was superior for both ma- 
nipulation and travel. The same results were 
obtained from a simple analysis of variance 
of the transfer test scores. These results 
would be expected because of the two radi- 
cally different perceptual conditions. By 
means of the analysis of covariance, the pre- 
test difference between the groups was re- 
moved so that any differences remaining be- 
tween the two groups could be attributed to 
differences in the training conditions. How- 
ever, when the pretest differences were re- 
moved, the difference between the two groups 
on the transfer test was not significant. This 
finding indicates that the training in neither 
perceptual condition was more helpful in 
transfer to the other condition, 

Table 3 shows the mean scores obtained by 
the groups on Day 11 and Day 12. It is ap- 
parent from the table that a change in per- 
ceptual conditions profoundly affects perform- 
ance, even after training. The visual group, 
when performing blind on Day 12, shows an 
increase in the mean duration of both travel 
and manipulation times, The blind group, 
on the other hand, shows a decrease in mean 
duration of these times when performing 
visually on Day 12. Analyses of variance 


Dimensional Analysis of Motion: IX. 


indicate that these changes are significant for 
both groups for both the manipulative and 
travel components. 

Table 4 presents the data in Table 3 re- 
arranged for ease of comparison between the 
two groups when performing the same task. 
The blind task scores are those for the blind- 
trained group on Day 12. The visual task 
scores are those for the visually-trained group 
on Day 11 and the blind-trained group on 
Day 12. The analyses of variance indicate 
a significant difference at the 5 per cent level 
in favor of the blind-trained group on the 
blind task for both manipulation and travel. 
There was no significant difference between 
groups on the visual task. 


Discussion and Summary 


The role of perception in human motion is 
a broad problem touching upon many theo- 
retical and applied aspects of psychology. 
Currently this problem is a very lively one in 
the fields of human engineering and of time 
study in industry. We have made one spe- 
cial approach to this general problem in the 
present investigation in terms of observations 
of the effects of visual and blind conditions 
of performance upon the different component 
movements in human motions. In this ex- 
periment the problem of the role of perception 
in performance is brought into relation to the 
whole field of motion analysis. 

In order to conduct the present research, 
special electronic methods of motion analysis 
are applied to the measurement of the ma- 
nipulation and travel components of move- 
ment in a panel control task. The durations 
of these movements are measured separately 
under blind and visual conditions of perform- 
ance, 

The significant difference between the lev- 
els of skilled performance for the visual and 
blind groups for both components shows that 
in the present case perceptual conditions pre- 
dominate in the performance. Practice under 
the blind condition does not compensate for 
the loss of vision. It is unlikely that further 
practice would enable the blind group to equal 
the performance of the visual group since the 
learning curves appear to have leveled off. 

The results of the comparison of visual and 


185 


blind skilled performance show that percep- 
tion does not have a particular role in deter- 
mining one aspect or component part of the 
task. Perceptual factors appear to affect both 
manipulation and travel. The blind indi- 
vidual is restricted in fine movements and in 
gross travel movements. 

The data on acquisition indicate that both 
groups showed a significant improvement in 
performance on the manipulative component 
as a function of practice. A significant im- 
provement in travel movement with practice 
occurs only with the blind group. This im- 
provement may be explained most directly in 
terms of the perceptual difficulty of the blind 
task. Previous studies have shown that per- 
ceptual “loading” of a task will give rise to 
learning in the travel component which is 
otherwise not present to any great extent 
(3, 4). 

In the case of manipulation, loss of vision * 
does not seem to affect the course of learn- 
ing, but prevents the attainment of a skill 
level equal to that achieved by the use of 
vision, If a little speculation is allowed, it 
can be said that manipulation under blind 
conditions would never reach the level of 
visually controlled manipulation in this task. 

The transfer data indicate that training 
under one perceptual condition is not more 
beneficial than training under the other when 
the groups are tested on the nontrained con- 
dition. Visual training facilitates blind per- 
formance and blind training facilitates visual 
performance, but the analysis of covariance 
does not indicate that there is a difference be- 


Table 4 


Comparison of Performance of Visual and Blind Groups 
Performing the Same Task in Terms of 
Mean Duration in Seconds 


Blind- Visually 

Trained Trained 

Task Group Group 

Blind Day 11 Day 12 
Manipulation 5.87 6.65 
Travel 4.12 4.67 

Visual Day 12 Day 11 
Manipulation 3.87 3.66 
Travel 3.49 3.51 


186 Janet Huiskamp, Robert C. 
tween the performance of the two groups 
which can be attributed to the perceptual dif- 
ference in training. 

The design of the transfer part of this 
study has provided the means of getting data 
on the problem of the automatizing of com- 
ponent movements in a learned motion. We 
wish to know whether either of the com- 
ponent movements in the task have been 
made automatic through learning to the ex- 
tent that change in the perceptual conditions 
will not change their level of performance. 
The data clearly show that neither of the two 
component movements was automatized to 
this extent. When changed to the blind con- 
dition, the visually trained group showed a 
deterioration in performance. When tested 
visually, the blind group showed an improve- 
ment in performance. The perceptual con- 
ditions as such outweigh any automatic fea- 
tures in the two movements resulting from 
learning. 

The role of perception in motion is one of 
the outstanding problems of industrial work. 
Long-standing theory and practice related to 
motion and time study in industry has dealt 
with this problem in only a superficial way; 
ie., by the assumption of specific perceptual 
therbligs of search and select. The present 
study points up certain important facts about 
the role of perception in work motions. Per- 
ceptual conditions appear to define in a main 
way the specific properties of different ther- 


Smader, and K. U. Smith 


bligs or parts of the work task, and not just 
one part. Furthermore, the general per- 
ceptual conditions involving use of vision and 
its absence seem to have a predominant role, 
in comparison to learning, in defining the ab- 
solute level of performance of the different 
component movements of manipulation and 
travel in skilled motion. 

The problems of motion analysis in in- 
dustry heretofore have not been brought into 
close relation with the varied phenomena of 
perception that have been studied in psychol- 
ogy. This experiment not only serves to em- 
phasize the importance of such scientific in- 
vestigation, but it also proves that electronic 
methods of motion analysis make possible the 
detailed study of motion in relation to per- 
ceptual factors. 


Received July 21, 1955. 


References 


1. Davis, R, Wehrkamp, R., & Smith, K. U. Di- 
mensional analysis of motion: I, Effects of 
laterality and movement direction, J. appl. 
Psychol., 1951, 35, 363-366, 

2. Rubin, G., von Trebra, P.,, & Smith, K. U. Di- 
mensional analysis of motion: III, Complexity 
of movement pattern. J. appl. Psychol., 1952, 
36, 272-276. 

3. Seymour, D, Manual skills and industrial produc- 
tivity. Production Engineers J., 1954, 3-10. 

4. Smader, R. The relation between perceptual com- 
plexity in a task and human motion. Unpub- 
lished doctor’s dissertation, Univer. of Wis- 
consin, 1955, 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


Carlos A. 


A New Technique for. Rapid Item Analysis * 


. Cuadra 


Veterans Administration Hospital, Downey, Illinois 


The resurgence of interest in self-report 
techniques during the past 15 years has been 
closely associated with, if not one result of, 
an increasingly empirical approach to per- 
sonality assessment. With a number of 
standardized research instruments (2, 3, 4) 
providing a reservoir of items to tap impor- 
tant characteristics and attitudes, the horizons 
of empirical research are limited only by the 
fertility of the experimenter’s imagination, 
the availability of stable criterion groups, and 
—of special importance—facilities for devel- 
oping new measures through item analysis. 
Unfortunately, the development of new em- 
pirical scales is hampered by the inaccessi- 
bility to most clinician-researchers of the 
modern electronic equipment which can re- 
duce a laborious task to manageable propor- 
tions, 

The purpose of this paper is to describe a 
simple new method to increase the speed and 
accuracy of item analysis without elaborate 
or expensive equipment. The method, thus 
far applicable only to Hankes-type (or Test- 
scor) answer sheets, involves the transferring 
of individual item responses to two or three 
specially designed 8 by 8-inch cards from 
jwhich item tallies can quickly be made. Fig- 
re 1 shows one such card designed by the 
author and named the Item Record Card. 
he spaces along the edges of the card corre- 
pond to the spaces on the Hankes Answer 
Sheet. 

With an Item Record Card placed directly 
nder the first row of “true” responses on 
ie answer sheet, these responses may be en- 
ered rapidly on the card by means of diago- 
mal marks. At the end of the first row of 30 
items, the card is turned 90 degrees and the 


1Sponsored by the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of his own study and 
do not necessarily reflect the opinion or policy of 
the Veterans Administration. 


187 


next 30 items entered. When 120 items have 
been completed, the card is turned over and 
a second 120 items entered on the reverse 
side. A 472-item protocol from the Cali- 
fornia Psychological Inventory, for example, 
can be completely transferred to the two 
Item Record Cards needed in less than two 
and one-half minutes. 

Once a series of protocols is entered on 
cards, an item tally is quite simple. The 
cards are laid in a column, exposing only the 
marked edges, and the total number of “true” 
responses to each item may be obtained easily 
by running down the successive columns, 

The method described has a number of im- 
portant advantages over other nonmechanical 
methods. First and perhaps most important 
is the saving in time and personnel. An 
ordinary item analysis of, say, 25 versus 25 
MMPI protocols usually takes one worker 
from 15 to 20 hours. With two persons— 
one reading the “true” responses while the 
other records—the job can be reduced to 
from 12 to 15 man-hours. Use of the new 
method described allows one person to do the 
work in only four to five hours. 


The Item Record Card 


A Technique for Rapid Item Analysis 
on Hankes-type Answer Sheets 


HEEL 


aoe 
aol 
ZES 
== 
= 
=e 
= 
Bak 


Fic. 1. The Item Record Card. 


188 


A second advantage lies in the method of 
recording. Since it is entirely visual and does 
not involve reading out and recording indi- 
vidual item numbers, the possibility of read- 
ing or recording error is sharply reduced. 

A third advantage may accrue whenever 
more than one criterion is used for the same 
subjects. In such instances the subjects’ 
cards need only be re-sorted before tallying 
the responses of the new criterion group. For 
example, in a study in progress on the selec- 
tion of psychiatric aides, the two important 
criterion variables were (a) tenure, and (b) 
good ward performance. Although separate 
item analyses were carried out for each vari- 
able, a number of long-tenure aides quite 
naturally also fell in the good performance 
group, and since their item responses were 
already available on Item Record Cards, it 
was not necessary to record them again. 
Once entered on the cards, subjects’ responses 
may without further effort become part of 
any number of possible criterion groups. 

One cautionary note should be registered 
at this point. There may be some tempta- 
tion to utilize the Item Record Card itself as 
an answer sheet, with the subject simply 
checking the items to which his response is 
“true.” While this would certainly eliminate 


Carlos A. Cuadra 


one operation from the steps involved in item 
analysis, the change from a true-false to a 
check—no-check task might introduce serious 
distortions into the data. As pointed out in 
another article (1), the reduction of any psy- 
chological rating task to a check-list method 
introduces a number of important and for 
the most part unmeasurable psychological 
variables into the rating situation. Used only 
as suggested above, however, the Item Record 
Card may prove a useful adjunct to empirical 
research with personality questionnaires and 
other self-rating techniques. 


Received August 8, 1955. 


References 


1. Cuadra, C. A, & Reed, C. F. Problems in the 
interpretation of check-list data. Unpublished 
study. Author, 1956. (Mimeographed) 

2. Gough, H. G. A preliminary guide for the Cali- 
fornia Psychological Inventory. Berkeley: 
Univer. of California Institute of Personality 
Assessment & Research, 1954. Pp. 1-55. 
(Mimeographed) 

3. Hathaway, S. R, & McKinley, J. C. A multi- 
phasic personality inventory (Minnesota): I. 
Construction of the schedule. J. Psychol., 
1940, 10, 249-254, 

4. Jurgensen, C. E. Report on the “Classification 
Inventory,” a personality test for industrial 
use. J. appl. Psychol., 1944, 28, 445-460. 


x 


ò-' Phe Journal af Applied Psychology 
Vol. 40, No. 3, 1956 


iz A Methodological Note on Time Intervals Between 
Consecutive Accidents 


Alexander Mintz 


It was pointed out in an earlier paper (2) 
that the study of time intervals between con- 
¥ secutive accidents of individuals is a suitable 
_ method of establishing whether or not acci- 
dents have an influence on subsequent acci- 
dent proneness. This is an issue of some 
importance. The arguments in favor of in- 
dividual differences in accident proneness 
which make use of distributions of accident 
frequencies among people generally presup- 
pose that accident proneness is uninfluenced 
by accidents. On the other hand, Horn (1), 
who studied time intervals between accidents 
of airplane pilots, concluded that their acci- 
dent proneness tended to be temporarily in- 
creased by accidents and recommended read- 
justment procedures after accidents. In the 
earlier study, data pertaining to time inter- 
vals between accidents of taxi drivers * were 
analyzed by methods differing from Horn’s, 
and it was concluded that in their case there 
was no evidence of increased accident prone- 
ness after accidents, 

The methods of dealing with the data in 
the earlier paper involved comparisons of time 
intervals before the first accidents, between 
eatly accidents, and between later accidents 
of the same individuals. A different method 
of dealing with the data will be explained 
and demonstrated in this paper: It will be 
shown how the frequency distribution of time 
intervals between a particular pair of acci- 
dents may be compared to a theoretical dis- 
tribution of such time intervals; such a theo- 
retical distribution is derived from the as- 
sumptions according to which accident prone- 
ness is constant in time, and accidents occur 
purely at random over the time interval rep- 
resenting the observation period. The mate- 
rial prepared to illustrate an application of 
such a method will also be compared to the 


1The data had been kindly contributed by Pro- 
fessor E. Ghiselli, University of California. 


City College of New York 


material presented by Horn in his study of 
airplane pilots. 

Contrary to what one might suppose, the 
random distribution of a number of events 
over an observation period contains quite un- 
equal numbers of short time intervals and 
long time intervals between consecutive events. 
The earlier paper already cited (2) summa- 
rizes the mathematical rationale, lists refer- 
ences, and gives a formula for the probability 
distribution of time intervals of different 
durations. If m accidents happen to each of 
a number of persons during an observation 
period of duration D, the probability of time 
interval x between consecutive accidents 
within this period is given by the equation 


7 n=l 
y=n(1-3) 5 


The probability of x being between the limits 
xı and x, is then given by the definite integral 


z, n=l 
Ro. 


If a group of people had varying numbers of 
accidents, the probability of the time interval 
between, e.g., their first and second accident 
being between x, and x2 is given by a weighted 
average of definite integrals of the above 
form. The weights correspond to the num- 
bers of people with varying ws, i.e., with two 
accidents each, three accidents each, etc. One 
can then tabulate the numbers of time inter- 
vals between, e.g., second accidents falling 
within certain time limits after the first acci- 
dent, and compare these empirical frequencies 
to the theoretical ones, as just explained. 

As an example, the times between the first 
and second accidents of the group of taxi 
drivers already discussed in the earlier paper 
were examined by the new method. The ac- 
cidents in which the drivers were involved 
were listed as taking place in particular weeks 


189 


190 


of the year; as an approximation, accidents 
were assumed to happen in the middle of the 
week, and time intervals between the acci- 
dents were computed accordingly. Sixty- 
seven of the total group of 162 drivers had 
two or more accidents. The numbers of 
drivers with two accidents, three accidents, 
etc. are listed in Table 1. 

The time intervals between the accidents 
were classified in four-week periods. The 
corresponding theoretical frequencies were 
computed as described earlier. Since the ob- 
servation period was one year, and four weeks 
is almost exactly one-thirteenth part of a 
year, D was set as equal to 1, and x; and x2 
in the definite integral became O and 1/13, 
1/13, and 2/13, respectively, etc. In accord- 
ance with the figures presented in Table 1, 
the formula used in order to compute the 
theoretical frequency of time intervals of less 
than four weeks between the first and second 
accidents was therefore 


1/13 
16 X a (1 — x) dx 
Si 
+13x3 f Goaiar 
0 
1/13 
a 3x4 f (1 — x) dx 
0 
1/18 
+ axs f (1 — «)* dx, 


etc. For the computations of the theoretical 
frequencies of time intervals of longer dura- 
tion, e.g., five to eight weeks, the same for- 
mula was used, but with different pairs of 
limits for the definite integrals, e.g., 1/13 
and 2/13, etc. The obtained and theoretical 
frequencies of the time intervals thus com- 
puted are presented in Table 2. 

It is apparent that the obtained and the 
theoretical frequencies are quite similar, and 
the chi-square test indicated that the differ- 
ences between them do not approach statisti- 
cal significance.? Thus the conclusion of 


?In view of the nonsignificant differences between 
the theoretical and the empirical distributions, sea- 
sonal fluctuations that may have been present were 
not investigated. The small and nonsignificant ex- 
cess of less-than-four-weeks time intervals in the em- 
Pirical distribution may have been due to this factor. 


Alexander Mintz 


apparent lack of an effect of accidents on 
accident proneness in this set of data, which 
was reached in the earlier paper, is confirmed 
by the new method. 

In addition, Table 2 exhibits another fact: 
Both in the case of the actual and the theo- 
retical frequencies there is a marked pre- 
ponderance of short time intervals over long 
ones. In the case of the theoretical fre- 
quencies, almost a third of the time intervals 
are less than one month, over one-half less 
than two months. This large number of 
short time intervals betwen consecutive acci- 
dents compared to long ones follows directly 
from the fact that the equation given earlier, 


n=l 
n(1 -3) 


is a monotonically decreasing function of x 
for all n’s greater than one. Any weighted 
sum of such functions also monotonically de- 
creases with x. 

In his study of airplane pilots Horn had 
used a similar preponderance of short time in- 
tervals over long ones as his principal argu- 
ment for the view that their accident prone- 
ness increased after accidents. The discus- 
sion of the preceding paragraph shows that 


Table 1 


Accident Distribution of 67 Drivers with 
Two or More Accidents 


Number of Number of © 
Accidents Drivers 
2 16 
3 13 
4 13 
bi 4 
6 5 
7 3 
8 2 
9 3 
11 2 
12 1 
13 1 
15 1 
16 1 
18 1 
25 1 
Total 67 


Time Intervals Between Consecutive Accidents 


=a such a finding is entirely inconclusive. It can 
©- be duplicated by a theoretical distribution of 
time intervals based on the assumption that 
accidents occur at purely random times. In 
order to establish that accidents increase ac- 
cident proneness, one has to show that the 
excess of short intervals over long ones is sig- 
nificantly greater than is the case in the theo- 
retical distribution. Or else one can com- 
* pare times between earlier and later accidents 
of the same people. In Horn’s paper there is 
no tabulation of the time intervals that would 
enable one to make the latter comparison. A 
tabulation of frequencies of pilots with vary- 
ing numbers of accidents during the observa- 
tion period is also lacking, so that the theo- 
retical random distribution of time intervals 
cannot be computed. In the absence of such 
information it is impossible to tell whether 
accident proneness of airplane pilots (unlike 
that of taxi drivers) does increase after acci- 
dents. 
ad Summary 


It is shown how information about num- 
s of people with varying numbers of acci- 
ts, together with the assumption that acci- 
ts happen to people at random times, may 
used to compute a theoretical distribution 
| time intervals between consecutive acci- 
ts. An obtained distribution of such time 
rals may then be compared to the theo- 
one. As an example, a distribution of 
intervals between first and second acci- 
of a group of taxi drivers was ex- 
ed; it was not significantly different from 
heoretical distribution. 

pointed out that in the theoretical dis- 
ion of time intervals between consecu- 


191 


Table 2 


Frequencies of Second Accidents of Taxi Drivers 
Occurring Within Various Periods of Time 
After the First Accident, and 
Theoretical Frequencies 


(Data collected by Dr. E. Ghiselli) 


Week After First Actual and Theoretical* 


Accident Frequencies 
1st to 4th 26 (21.1) 
Sth to 8th 11 (13.0) 
9th to 12th 10 (9.4) 

13th to 16th 6 (6.5) 
17th to 20th 4 (48) 
21st to 24th 4 (3.7) 
25th to 28th 0 (2.7) 
29th to 32nd 0 (19) 
33rd to 36th OP vats) 
37th to 40th 1 (1.0) 
41st to 44th 0 (0.6) 
45th to 48th 1 (0.3) 
49th to 52nd 0 (0.1) 

Totals 67 66.6 


*The theoretical frequencies, shown in parentheses, were 
computed by five-place logarithms and rounded off. 


tive accidents, short time intervals are much 
more frequent than long ones. Horn had 
previously used a finding of this type as argu- 
ment for the view that airplane pilots become 
accident prone after accidents; the argument 
is entirely inconclusive. 


Received August 12, 1955. 


References 


1. Horn, D. A study of pilots with repeated acci- 
dents. J. aviat, Med., 1947, 18, 440-449, 

2. Mintz, A. Time intervals between accidents, J. 
appl. Psychol., 1954, 38, 401-407. 


The Journal of Applied Psychology 
Vol. 40, No. 3, 1956 


A Note on the “Fakability” of the Minnesota Teacher 
Attitude Inventory * 


A. Garth Sorenson 
School of Education, University of California, Los Angeles 


This investigation was undertaken to dis- 
cover whether or not prospective teachers can 
deliberately change their response to the Min- 
nesota Teacher Attitude Inventory (MTAI) 


in such a manner as to improve their total’ 


scores significantly. A secondary question con- 
cerns the effect of signing versus not signing 
the answer sheet. Will students who sign 
their names be more or less inclined to fake 
than those who do not sign? There are sev- 
eral reasons why such a study is important. 
Of immediate concern to the investigator was 
the question of whether the inventory, which 
is designed to predict the ability of teachers 
to effect harmonious interpersonal relations in 
the classroom (1, 2), is likely to be of value 
in the selection of candidates for teaching 
credentials, Obviously if it can be readily 
“faked,” its value as a selection device will be 
limited. In at least one study, evidence is 
presented to indicate that the MTAI can be 
faked by college students (3). In another 
study where the MTAI was administered at 
the beginning and at the end of a course, it 
was found that the students’ scores changed 
in the “right” direction. It was not deter- 
mined whether the course effected real changes 
in attitudes or merely made the students 
“test-wise” so far as the MTAT is concerned 


(4). 
Sample and Administrative Procedure 


The subjects. of the present study were 406 pro- 
spective teachers, elementary and secondary, in the 
School of Education at U. C. L. A. About half 
were inventoried in the fall semester, 1954, the re- 
mainder in the spring semester, 1955. The prospec- 
tive elementary teachers were enrolled in two sec- 
tions (one each semester) of a class in child growth 
and development. The prospective secondary teach- 
ers were enrolled in four sections (two each semes- 


1 This study was supported in part by the Fund 
for Occupational Research of the School of Educa- 
tion, U. C. L. A. Martin S. Sheldon assisted in the 
gathering and tabulation of data. 


ter) of a class in principles of guidance. Since the 
above courses are required of the respective groups 
of candidates for teacher credentials, the sample is 
probably representative of students in the School of 
Education at U. C. L. A. 

The inventory was administered during a regular 
class hour. No advance announcement was made. 
As each student entered the classroom, he was handed 
the inventory, an electrographic pencil and two an- 
swer sheets. The answer sheets, bearing duplicate 
numbers, were labeled “A” and “B.” He also re- 
ceived one of two sets of typewritten instructions. 
One set of instructions, distributed to alternate stu- 
dents as they entered the classroom, included these 
directions: 

“The purpose of this exercise is to furnish the in- 
structor with information regarding some of the 
attitudes of this class. You have been given two 
answer sheets. Be sure to put your name on both 
of them, Please indicate your age and sex. Then 
put the “B” answer sheet aside. When you have 
finished filling out the inventory, hold up your hand.” 

The other half of the students received the same 
instructions except they were told it would not be 
necessary to sign their names. 

As soon as a student held up his hand to signify 
completion of the inventory, his “A” answer sheet 
was collected and he was handed a set of instruc- 
tions for the “B” answer sheet, reading as follows: 

“Will you please imagine yourself to be an ap- 
plicant for a teaching position in a school system 
which is known to prefer ‘progressive’ teachers and 
fill out the inventory in such a way as to make a 
good impression.” 


Results 


The effect of instructions to make a good 
impression, i.e., to fake, was studied by group- 
ing the data as indicated in Table 1. It will 
be noted that in each of the groups the mean 
faked score is higher than the mean original 
score and that the difference in means is in 
each case significant at the .001 level of 
confidence. 

A comparison of the unsigned and signed 
answer sheets produced the following data: 
The mean original score for 204 students who 
did not sign the answer sheet was 41, SD 29. 
Their mean faked score was 70, SD 28. 

The mean original score for 202 students 


192 


é 


“Fakability” 


who signed the answer sheets was 46, SD 28. 
Their mean faked score was 71, SD 29. 

The difference in mean original scores, 
signed versus unsigned was 5, critical ratio 
1.79, level of confidence .05, one-tailed test. 

The difference in mean faked scores, signed 
versus unsigned was 1, critical ratio .36, not 
statistically significant. 

The effect of signing one’s name versus not 
signing was further checked by the following 
procedure. After scoring, the original and 
faked answer sheets were paired according to 
the duplicate numbers on the answer sheets. 
In each case, the original or “A” score was 
subtracted from the “B” or faked score to 
give a “gains” score. The gains scores were 

‘tabulated separately for the signed and un- 
signed groups. 

The range of gains scores on the 204 un- 
signed answer sheets was from — 59 to 147 
with a mean of 30.8 and an SD of 28.4. 

The range of gains scores on the 202 
signed answer sheets was from — 53 to 111 
with a mean gain of 23.7 and an SD of 23.7. 

The difference in gains scores for the two 
groups was thus 7.1, critical ratio 2.73, level 
of confidence .01. 

Discussion 

When instructed to fake the MTAI, many 
of the students in the above samples made 
very large gains over the scores they had 
achieved under standard directions. A few 
made very large losses. (Perhaps the latter 


of the MTAI 193 


assigned a different meaning to the term 
“progressive” than did the former.) The 
majority improved their scores, as indicated 
by the fact that under instructions to fake, 
the various groups raised their mean scores 
approximately one standard deviation. (The 
difference is significant at the .001 level of 
confidence.) 

Thus it would appear that the answer to 
the question, “Can prospective teachers fake 
the MTAI?” is a qualified “Yes.” Some of 
the students in the above samples faked more 
successfully than others and some faked in 
the wrong direction, but most were able to 
better their scores significantly. 

To the question, “Does signing the answer 
sheets make a difference?” The answer is 
also a qualified “Yes.” As a group, the non- 
signers made lower scores under standard 
directions than did the signers, and conse- 
quently larger gains scores. This might in- 
dicate that some of the nonsigners were more 
frank, less concerned about giving a socially 
acceptable response, than were the signers. 

Perhaps most significant in the framework 
of this study is the range of the gains scores. 
As noted above, when students took the in- 
ventory a second time under changed direc- 
tions, the general tendency was to improve 
one’s score, but one student lowered his score 
by 59 points, while another increased his by 
147, a striking illustration of the common- 
place knowledge that such factors as a stu- 
dent’s beliefs regarding the use to which the 


Table 1 
A Comparison of Original and Faked Scores 
Mean SD Mean SD Differ- Level of 
Original Original Faked Faked ence in Critical Signifi- 
Group N Score Score Score Score Means Ratio cance 
Prospective 
elementary 
teachers 
Fall 81 51 22 72 22 21 10.4 001 
Spring 72 45 28 71 30 26 8.6 001 
Prospective 
secondary 
teachers 5 
Fall 113 41 30 70 31 29 12.2 001 
Spring 140 41 30 70 28 29 14.2 001 
NN 


194 


scores will be put and his understanding of 
directions may influence his response to an 
inventory. 

This study does not answer the important 
question, “Will prospective teachers fake the 
MTAI?” However, in view of the above data 
it would appear that one who proposes to use 
the MTAI in a selection program would do 
well to carefully consider the problem of how 
to get the cooperation of his subjects. At 
least in some cases, if the respondent sees rea- 
son to cooperate, e.g., believes that his scores 
are to be used in counseling, he may answer 
the inventory quite differently than he would 
were he to believe that his responses would 
affect his chances of obtaining a job which he 
wanted, or of gaining admission to a training 
program which he desired to enter. 


Summary 


This study was designed to investigate the 
question of whether prospective teachers can 
deliberately “fake” the MTAI, and to learn 
whether the fact that a student signs (or does 
not sign) his name is likely to influence his 
score, 

Four hundred six prospective elementary 
and secondary teachers completed the MTAI, 
first under standard directions, and then un- 
der directions to “fake” the inventory. Half 
of the students signed their names to the an- 


A. Garth Sorenson 


swer sheets, the other half did not. When 
the original and “faked” scores of each stu- 
dent were compared it was found that some 
students had improved their original scores 
greatly, while others had changed their scores 
but in the wrong direction. The group means 
had increased, the difference being significant 
at the .001 level of confidence. 

It appears that signing the answer sheets 
does have an effect, at least in the case of 
some students. The mean original scores of 
the nonsigners was lower than that of the 
signers, while the mean “faked” scores were 
approximately the same. Thus, the “gains” 
score for the nonsigners was somewhat higher, 
the difference being significant at the .01 level 
of confidence. 


Received June 28, 1955. 


References 


1. Cook, W. W., Leeds, C. H., & Callis, R. The 
Minnesota Teacher Attitude Inventory. New 
York: Psychological Corporation, 1951. 

2. Leeds, C. H. A scale for measuring teacher-pupil 
attitudes and teacher-pupil rapport. Psychol. 
Monogr., 1950, 64, No. 6 (Whole No. 312). 

3. Rabinowitz, W. The fakeability of the Minne- 
sota Teacher Attitude Inventory. Educ. psy- 
chol. Measmt, 1954, 14, 657-664. 

4. Shaw, J., Klausmeier, H. J., Lukes, A. H., & Reid, 
H. T. Changes occurring in teacher-pupil 
attitudes during a two-weeks guidance work- 
shop. J. appl. Psychol., 1952, 36, 304-306. 


| 


< 


“pnd 


The Journal of Applied Psyckolo 
Vol, 40,.NOvS; 1986 


A Note on Measuring “Understandability” 


Robert F. Lockman + 


Bureau of Naval Personnel 


Although numerous articles on readability 
measurement have been published, the meas- 
urement of prose intelligibility or “under- 
standability” has received little attention. 
Flesch (1) has pointed out that readability 
measures will not indicate whether the ideas 
expressed are nonsense—or ungrammatical, it 
might be added. Consequently, a reliable 
measure of understandability would be an im- 
portant supplement to readability, especially 
where readability estimates lack relevance for 
a particular group. Theoretically, wherever 
assessed understandability is low, regardless 
of measured readability level, revision to im- 
prove comprehension of the material in ques- 
tion is indicated. 

To this end, the author devised an experi- 
mental rating scale with seven categories com- 
parable to the standard style descriptions 
used by Flesch (1): very easy, easy, fairly 
easy, standard, fairly difficult, difficult, and 
very difficult. The rating form instructs the 
subject to check one of these descriptions 
with respect to his judgment of the under- 
standability of the material being analyzed. 
For example: “In regard to the material you 
have just read, how hard to understand did 
you think it was? Check one of the follow- 
ing to show your over-all judgment.” The 
instructions and style descriptions can be pre- 
sented orally if necessary. 

Data so obtained are suitable for analysis 
with modal ratings, their corresponding aca- 
demic grade levels, and percentages of ratings 
in each category. Comparisons then can be 
made with reading ease (RE) scores trans- 
lated into grade levels and style descriptions. 
Correlations between RE scores and under- 
standability ratings can be meaningful if scale 


1The author was attached to the Aviation Psy- 
chology Laboratory, U. S. Naval School of Aviation 
Medicine when the data cited in this note were col- 
lected. Opinions or conclusions herein do not neces- 
sarily reflect the views or possess the endorsement 


of the Navy Department. 


directions, types of material analyzed, and 
composition of the rating group are carefully 
considered. 

To cite briefly an application of the fore- 
going technique, RE scores for nine sets of 
directions on standard psychological tests 
used with Naval Aviation Cadets were com- 
puted with the simplified Flesch formula (2). 
From 129 to 273 cadets (median of 171) rated 
these same materials, the number varying 
with test administration schedules. All cadets 
entered the Naval Aviation Cadet Training 
Program during February and March, 1954, 
were from 18 to 25 years of age, had two 
years of college or its equivalent, and had 
been selected on the basis of the Navy Flight 
Aptitude Rating battery and a stringent 
physical examination. Because cadets are 
highly selected and relatively homogeneous, 
Flesch RE estimates for the materials used 
are not uniformly relevant. Nevertheless, 
legitimate comparisons of RE and under- 
standability data can be made within the 
sample. Directions were analyzed for tests 
of academic aptitude, spatial orientation, atti- 
tudes, temperament, and personality. 

Flesch RE style descriptions ranged from 
“fairly easy” to “difficult,” the mode being 
“standard” (8th and 9th grade level). Only 
the Frenkel-Brunswik F-Scale instructions 
were at the “college” RE level. In contrast, 
modal understandability ratings of “very 
easy” (53 to 74 per cent of the ratings) oc- 
curred for all materials except the Navy 
Spatial Apperception Test instructions. In 
this case, 32 per cent rated them “standard” 
and 26 per cent “fairly easy.” Both RE 
scores and understandability ratings had rela- 
tively little spread, and both indicated that 
the test directions or instructions could be 
readily comprehended by cadets. However, 
definite discrepancies in agreement on gen- 
eral style descriptive levels and in the spe- 
cial cases noted existed between the two tech- 
niques. 


195 


196 Robert F. 
Although selection factors will act as de- 
Ptessors, rank-order and product-moment cor- 
relations were computed to give some indica- 
tion of the relationships between RE scores 
and understandability ratings. Mean under- 
standability values for each set of test direc- 
tions were computed by summing over coded 
rating categories (coded 1 through 7 corre- 
sponding with “very easy” through “very diffi- 
cult”) times the number of raters choosing 
each category, and dividing by the total num- 
ber of raters. These values were ranked from 
low to high. RE scores were ranked from 
high to low, since the higher the score, the 
more readable the material. To correct for 
this scale reversal in the product-moment 
computations, each RE score was subtracted 
from 100 and the difference used as the score. 
The rho was — .65, significant at the .05 level 
in Olds’s tables (3). The product-moment 
coefficient was —.52, but not significantly 
different from zero with an n of 9. Since 
these two coefficients are similar in magnitude 
and direction, it would appear that RE scores 


Lockman - č 
and understandability ratings were. not meas- 
uring the same thing. 

Since high Flesch RE scores are not too 
relevant with highly selected groups, reliable 
understandability ratings could supplement 
them with data on the range and average 
level of intelligibility. When low for a par- 
ticular group, regardless of the materials 
measured readability level, these indices would 
indicate revision for better comprehension, 
With materials whose RE scores are more 
relevant for the group involved, ratings could 
specify the limits of intelligibility which can- 
not be determined from readability estimates, 


Received June 7, 1955, 


References 


1. Flesch, R. How to test readability. New York: 
Harper, 1951. 

2. Lockman, R. F. Readability of NavCad selection 
tests. USN Sch. Aviat. Med. Res. Rep., 1953, 
Rep. No. NM 001 0537.16.05, 

3. Olds, E. G. Distributions of sums of squares 
of rank differences for small numbers of indi- 
viduals. Ann. math. Statist., 1938, 9, 133-149. 


Fag mn 


F + 
The Jouri of Applied Psycholo; 
Bis iosa PRAA 


GATB in Foreign Countries 


Beatrice J. Dvorak 
Testing Branch, U. S. Employment Service, U. S. Department of Labor 


The USES General Aptitude Test Battery 
has been translated into a number of foreign 
languages and research is being conducted in 
these foreign countries to adapt and stand- 
ardize it for use on populations in those coun- 
tries. About a year ago, this Journal pub- 
lished a list of organizations and individuals 
that had been granted permission by the 
U. S. Employment Service to use the GATB 
in such research. Foreign psychologists have 


` expressed considerable interest in that infor- 


mation. Since the list has doubled during the 
past year, an up-to-date list is presented be- 
low. While information is not available re- 
garding the status of all of these projects, it 
is known that the French, Japanese, Portu- 
guese, and Spanish editions have already been 
published. 


Argentina 


Carlos A. Pourteau Agote 


Universidad de Buenos Aires 
Laboratorio Psicotecnico & 
Buenos Aires, Argentina 


Australia 
H. A. Bland* 
Department of Labour and National Service 
Melbourne, Australia 


Belgium 
R. Bđyse 


. University of Lourain 


Tournai, Belgium 
R. Dessart 


. Compagnie Generale des Conduites D’Eau 


Liége, Belgium 

«M. Dewals 

Psychotechnicien de la Société Nationale des 
Chemins de Fer Vicinaux 

Bruxelles, Belgium 


J. Gillet 
Ministére de la Defense Nationale 


- Bruxelles, Belgium 


: Jean Herickx 


Centre d’Orientation 
Bruxelles, Belgium 


P. Houssa 
Brugmann Hospital 
Bruxelles, Belgium 


F. Vandenborre 
Ministère de l'Instruction Publique 
Bruxelles, Belgium 


Brazil 


Joal Baptista d’Avilla 
Servico Nacional De Aprendizagem Industrial 
Sao Paulo, Brazil 


Jacy Magalhaes 
Divisao de Organizacao do Trabalho 
Rio de Janeiro, Brazil 


Livraria Oscar Nicolai 
Belo Horizonte, Brazil 


Eugene Novgorodoff 

Ladeira Tabajaras 140, Apto 904 
Copacabana 

Rio de Janeiro, Brazil 


S. J. Schwarzstein 
Servico de Colocacao e Informacao Profissional 
Sao Paulo, Brazil 


Canada 
G. P. Cosgrave 
Director, Counseling Service 
The Toronto Young Men’s Christian Association 
Toronto, Canada 


J. Fred Dawe 
Civil Service Commission 
Ottawa, Canada 


Thomas Fishbourne 
Canadian National Employment Service 
Ottawa, Canada 


Morgan D. Parmenter 
University of Toronto 
Toronto, Canada 


China 
Ministry of Social Affairs 
Shanghai, China 


Cuba 
Jose M. Gutierrez 
Universidad de la Habana 
Habana, Cuba 


197 


198 


Denmark 
Poul Bahnsen 
Director, Psykotekniske Institut 
Copenhagen, Denmark 


Paul Vidriksen 
Arbejdsdi Rektoratet 
Copenhagen, Denmark 


Egypt 
S. A. Batraur 
Egyptian Army 
Cairo, Egypt 
S. A. Morsi 


Egyptian Army 
Cairo, Egypt 


El Salvador 


Hector Garay Pacheco 
Ministro de Trabajo y Previsión Social 
San Salvador, El Salvador 


Mario Hector Salazar 
Ministro de Trabajo y Previsión Social 
San Salvador, El Salvador 


England 
S. M. Cox 
Westminster Hospital 
London, England 


M. Desai 

Psychological Department 
London County Council 
London, England 


H. J. Eysenck 
The Maudsley Hospital 
London, England 


Edward Fox 

Winwick and Newchurch Hospital Management 
Committee 

Warrington, England 


C. B. Frisby 


Director, National Institute of Industrial Psy- 
chology 
London, England 


Roland Harper 
The University of Leeds 
Leeds, England 


D. R. Martin 


The University of Leeds 
Leeds, England 


Constance M. Mathieson 
East, Anglian Regional Hospital Board 
Noggich, England 


Beatrice J. Dvorak 


C. E. Mitchell 
St. Francis Hospital 
Hayward Heath, England 


G. Naylor 
Rainhill Hospital 
_ Rainhill, England 
C. J. Price š 
Westminster Hospital 
London, England 


B. W. Richards 
St. Laurence’s Hospital 
Caterham, England 


Alec Rodger 
Birkbeck College 
University of London 
London, England 


J. Tizard 4 
The Maudsley Hospital 
London, England 


India 


N. R. Chattopadhyay 
Department of Psychology 
University of Calcutta 
C ia 


D. K. Dator 
Ministry of Labour 
Bombay, India 


Bhim S. Narula 
Ministry of Labor 
New Delhi, India 


Vocational Guidance Bureau 
Bombay, India 


Israel 
Esther Gottstein 
Mental Health Clinic 
Jerusalem, Israel 


Italy 
Silvano Chiari 


Centro Di Orientamento Scolastico Professionale 
Firenze, Italy 
Gastone Conti A 

Instituto Tecnico Industriale 

Udine, Italy 

Vincenzo Flagiello 

Societa per l'Industria e FEllettricita 


Centro Istruzione Professionale 
Terni, Italy 


7 


. 


PN ' : 
wh 
Agostino Gemelli 
Director, Laboral 

Milano, Italy 


io di Psicologia Sperimentale 


Instituto Nazionale di Psicologia 
Rome, Italy 


Guido Majaron 
Viale Arnaldo Fusinato 2F 
Vicenza, Italy 


Luigi Meschieri 
Depattment of Labor 
Rome, Italy 


Vasco Pisani 

Consorzio Provinciale Istruzione Tecnica 

Centro Di Orientamento Scolastico Professionale 
Siena, Italy 

Giorgio Tampieri 

Consorzio Provinciale Istruzione Tecnica 

Centro Orientamento Scolastico Professionale 
Trieste, Italy 


Japan 
Gregory Kihachi Fujimoto 
Rikkyo University 
Tokyo, Japan 
T. Kondo 
Employment Security Bureau 
Tokyo, Japan 
Hiroshi Matsumoto 
‘Ministry of Labor 
Tokyo, Japan 


Malta 


John Patrick Hamilton 
Department of Labour 
Valetta, Malta 


if Mexico 

Matias Lopez, Jr. 

Maria Montessori Psychology Laboratory 
Tlaxcala, Mexico 


New Zealand 


Auckland University College 
Auckland, New Zealand 


W. J. H. Clark 
Vocational Guidance Centre 
Auckland, New Zealand 


Pakistan 
“Rafi Z. Khan 
Pakistan Public Service Commission 
Karachi, Pakistan 


GATB in Foreign Countries 


199 


Peru 
Santiago Salinas 
Ministerio de Trabajo y Asuntos Indigenas 
Lima, Peru 


Philippines 


Apolinario Garcia Apilado 
Northern Luzon School of Arts and Trades 
Vigan, Ilocos Sur, Philippines 


Florencio Mones Apolinar 

Vocational Education Division 

Bureau of Public Schools 

Department of Education 

Manila, Philippines 

Alberto Bernardo Garcia 

Central Luzon School of Arts and Trades 
Cabanatuan City, Philippines 


Wenceslao Gozan 
Department of Labor 
Manila, Philippines 


Rodrigo L. Jarabe 

Philippine Employment Service 
Manila, Philippines 

Antonio V. Roxas 

Escolta, Manila, Philippines 


Guillermo Torres 
Mindanao College 
Davao, Mindanao, Philippines 


Scotland 
P. S. Boyd 
Department of Mental Health 
Aberdeen, Scotland 


W. M. Miller 
Department of Mental Health 
Aberdeen, Scotland 


South Africa 


Department of Psychology 
University of Stellenbosch 
Stellenbosch, South Africa 


C. P. J. Erasmus 

University of the Orange Free State 
Bloemfontein, South Africa 

Evryl Fisher 

Church Street 

Cape Town, South Africa 

D. J. Du Plessis 


Department of Labor 
Johannesburg, South Africa 


200. © Beatrice J. Dvorak b ; 


; -g 
J. J, Scheepers — Turkey 2 
` Department of Labor Faruk Kardam * 
s Johannesburg, South Africa Turkish Employment Service 
Ankara, Turkey } 
4 ey Sweden . + 
Torsten Husen Venezuela ~ 
Cintrala Varnpliktsbyran John R. Boulger k 
< Personalprovingsdetaljen : Socony-Vacuum Oil Company of Venezuela 
_ Stockholm, Sweden n Apartado No. 246 à 
Ks Switzerland Caracas, Venezuela 
J. F. Herzog p a < 
Office d'Orientation Professionelle Various) Foreign |Countries 
Neuchatel, Switzerland Edwin R. Henry 
Employment Relations Department 
Ph. H. Muller 1 4 
Université de Neuchatel # Standard Om Company 
Neuchatel, Switzerland Received July 11, 1955. 
i 


T 


7 


Journal of Applied Psychology | 


VoL. 40, No. 4 


AUGUST, 1956 


Changes in Attitudes Toward a Low-Rent Housing Project * 


Kenneth E. Clark and Robert L. Jones 


University of Minnesota 


The psychological and sociological litera- 
ture contains a number of studies which bear 
on reactions to public housing and commu- 
nity development (3, 4, 5, 6). Many such 
investigations deal with reactions of public 
housing tenants to their housing project and 
with their attitudes toward fellow tenants. 
Many have been single-shot cross-section 
studies. Few have provided for follow-up 
research on neighborhood reaction to a pub- 
lic housing project and to project residents 
over a period of time. This study is the third 
in a series reporting neighborhood reaction 
to and attitudes toward a public, low-rent 
housing project built in 1952 in a long-estab- 
lished residential area in a large midwestern 
city. It is designed to provide panel-type 
data on individual respondent opinions and 
household opinions in addition to providing 
more gross data on total sample responses 
from year to year. It examines patterns of 
attitude shift and reactions to several stages 
of “reality” in a community-change situation. 

The first article in this series (1) reported 
reactions of a fixed-address probability sam- 
ple of community residents in 1950 to the 
prospect of a housing development being built 
in the neighborhood after plans for its erec- 
tion had been approved by the city council, 
but before any construction had begun. In 
addition to opinion and attitude data, this 
initial study obtained responses to a set of 
information questions about the proposed 
project, estimates of the effect of the project 
on property values, taxes, school crowding, 


1The writers are indebted to the Graduate School 
of the University of Minnesota for financial support 
of the project described herein. Field work was per- 
formed by the professional staff of the Research Di- 
vision, School of aaa University of Minne- 
sota, 


noise and traffic, and data on the extent to 
which respondents had participated in com- 
munity meetings about the project. 

The second study (2) was undertaken in 
1952 just after construction ðf the project 
had been completed but before any families 
had occupied the units. The 1950 question- 
naire was used again in the 1952 study with 
minor modifications. Interviewers called on 
all fixed-address households drawn in the ini- 
tial study sample and on an additional 192 
households which were assigned in order to 
increase the stability of breakdown analyses. 

A summary of the results of the earlier 
studies follows: 

1950 “Planning-Stage” study: As many 
neighborhood residents opposed the project 
as favored it, although a fourth of the sam- 
ple took no position on the matter; a “near” 
group of residents located within two blocks 
of the proposed site of the development was 
only slightly less favorable to the project 
than the remainder of the sample members; 
income level was not significantly related to 
favoring or opposing the project; people who 
had attended community meetings on the 
project were much more intense in their opin- 
ions and tended to favor the project; more 
than a third of the respondents thought prop- 
erty values would go down because of the 
project, while only 5 per cent thought they 
would go up because of it; more than 40 per 
cent (a plurality of the respondents on this 
item) thought the project would bring “un- 
desirable” people into the neighborhood and 
into the schools; more than a fourth indi- 
cated that the project would affect their plans 
to stay in the neighborhood; those who held’ 
more intense opinions on the project were 
better informed about it, and persons who at- 


202 


tended community meetings on the topic were 
better informed than those who relied on the 
daily press or on conversation with others for 
their information. 

1952 “Completion-of-Construction” study: 
The proportion of neighborhood residents fa- 
voring the project increased from 39 to 45 per 
cent and the proportion opposing dropped 
from 38 to 31 per cent. The undecided pro- 
portion remained essentially the same as be- 
fore; this upturn in favorable attitude cut 
across all income levels; there was a substan- 
tial drop in proportion of respondents think- 
ing that the project would* adversely affect 
property values or would bring “undesirable 
persons” into the neighborhood; many fewer 
persons indicated that the project would have 
any effect on their long term plans to con- 
tinue residence in the neighborhood; there 
was some loss in information level about the 
project; on the “core question” which asked 
whether the respondent favored or opposed 
the project there was a sizable shift from 
no opinion to favor, and from oppose to un- 
decided among identical respondents inter- 
viewed in both studies. 


Procedure 


In the early summer of 1954—two years after the 
housing project began to be occupied by tenants— 
the present study was completed. The 367 fixed ad- 
dresses which were used in the 1952 study were re- 
assigned to a team of professional interviewers. The 
questionnaire from the 1952 study was slightly modi- 
fied, mostly by dropping inapplicable construction- 
era questions and by changing verb tenses, The 
same key questions were asked again. Interviewers 
were assigned to interview a responsible adult at 
each fixed-address household and to maintain about 
a 50-50 split between men and women. Of the 367 
assigned addresses, five were unusable in 1954 be- 
cause the houses either had been torn down or were 
vacant at the time of the survey. Out of the 362 
remaining households, interviews were obtained with 
a responsible adult at 347, or 96 per cent. Seven 
householders refused to be interviewed (less than 2 
per cent) and at the remaining eight households no 
one was found at home even with three call-backs 
at different hours of the day. Most of these families 
appeared to be on vacation according to informa- 
tion from neighbors. The 96 per cent completion- 
of-interviews figure maintained the record of this 
interviewing crew of having completed 96 per cent 
of its assignments in each of the three surveys? 


„> It might be mentioned that refusal to be inter- 
viewed in this study appeared to be more a situa- 


Kenneth E. Clark and Robert L. Jones 


The first portion of the results reported here de- 
scribe total or gross change in reaction to the hous- 
ing project using data from all persons sampled in 
1950, 1952, and 1954. 

Because of such factors as families moving into 
and out of the neighborhood, vacation-taking, and 
refusals and because of the ‘any-responsible-adult” 
sampling plan within the fixed addresses, the total 
samples in the 1952 and 1954 follow-up studies were 
made up of four components: identical respondents 
from the preceding study, identical households (fami- 
lies) from the preceding study but not the same 
respondent, new respondents and new families in 
certain of the same fixed addresses, and “noncorre- 
sponding cases,” ie., addresses at which an inter- 
view was obtained in one of the years, but not in 
the others. 

A breakdown of the 1954 sample compared with 
the 1952 sample on these four kinds of respondents 
will illustrate the composition of the latest sample: 
Of the 347 interviews completed in 1954, 119 were 
with identical respondents from the 1952 study, 139 
were with an alternate adult in the same house- 
holds, 75 were with respondents from entirely new 
families which had moved into the old fixed-ad- 
dress houses, and 14 were noncorresponding cases. 
Using 1950 respondents as a base (keeping in mind 
the addition of the new fixed addresses in 1952) the 
1954 sample has 49 identical respondents (these cases 
will be analyzed in panel fashion in a later sec- 
tion), 48 same-households, 72 same address-different 
family cases, and 178 noncorresponding cases. 


Results 
Total Samples 


Opposition to the housing project continued 
to decrease during the 1952—54 period. Thus 
at a time when the “reality” of the commu- 
nity change was highest—when the project 
was no longer just a plan or a group of un- 
tenanted buildings—more people favored and 
fewer opposed the project than in its earlier 
stages. Responses to the “core question,” Do 
you now favor or oppose the presence of this 
housing development in the neighborhood?, 
and an intensity question, How strongly do 
you feel about this?, are presented in Table 1 
tionally-influenced phenomenon than a “trait” of 
uncooperativeness on the part of the refusing re- 
spondent. Initial refusals were encountered at 17 
households, Skilled follow-up interviewers succeeded 
in getting 10 of these cases to complete the ballot. 
Only one of the seven remaining cases was a house- 
hold at which an interview had been refused in 
1952. Three of the 1954 “hard refusals” were at 
households in the midst of some personal crisis at 
the time, usually illness. In four cases, households 
which had refused an interview in 1952 (and which 


were the same families in 1954) granted an inter- 
view in 1954, 


Attitude Changes Toward a Low-Rent Housing Project 


203 


Table 1 
Opinions Toward Low Rent Housing Project in 1950, 1952, and 1954 


No Opinion or 


Favor Oppose Qualified 

1950 1952 1954 1950 1952 1954 ZAS agin A 
Total group: Number 73 -159 169 71 108 77 44 84 101 
Per cent 39 45 49 383i 22 23. < Dares 29) 

By intensity of feeling (per cent): 
Very strongly 42 46 4l 49 53 45 0 1 0 
Rather strongly 33 35 37 39 33 47 0 4 4 
Not strongly at all 24 Á t aaa 4 12 13 8 39: 211952 
No answer 1 2 0 0 ly (0 61 74 44 
Total per cent 100 100 100 100 100 100 100 100 100 


alongside data from comparable questions 
from the 1950 and 1952 studies in which the 
question wording varied slightly to take ac- 
count of the planning stage and completion- 
of-construction stage of the project. 

It is interesting to note that the proportion 
of undecided respondents or persons with no 
opinion remained in the vicinity of a fourth 
of the total sample and even showed a slight 
increase in 1954, Further analysis of this 
undecided group in terms of length of resi- 
dence in the community revealed that there 
was no significant difference in proportion 
of undecided respondents among short-, me- 
dium-, and long-term residents, of the neigh- 
borhood. The sizable incidence of undecided 
response, then, does not appear to reflect any 
lack of opportunity on the part of a group 


of short-term residents to become aware of 
the development. 

Response to the “core question” next was 
analyzed by various respondent income levels. 
In phrasing the income question in 1954, an 
additional $500 was added to each response 
category compared with the 1952 question. 
This was the writers’ estimate of the average 
income increment expected during the two- 
year period. A similar adjustment had been 
made in the 1952 question as compared with 
the 1950 question. The similarity between 
the 1952 and 1954 frequencies in income 
classifications following the $500 adjustment 
is very close and indicates that the estimate 
of typical income increment was rather ac- 
curate. A distribution of respondent incomes 
for all three studies is shown in Table 2. 


Table 2 
Reported Incomes of Respondents in 1950, 1952, and 1954 
Income Level N Per Cent 
1950 1952 1954 1950 1952 1954 1950 1952 1954 
$5,000 up $5,500 up $6,000 up 62 108 109 33 31 32 
4,000-4,999 4,500-5,499 5,000-5,999 38 54 54 20 16 16 
3,000-3,999 3,500-4,499 4,000-4,999 36 71 60 19 20 17 
2,000-2,999 2,500-3,499 3,000-3,999 30 53 50 16 15 14 
1,000-1,999 1,500-2,499 2,000-2,999 13 21 26 7 6 4 
0-999 0-1,499 0-1,999 2 15 25 1 4 7 
No answer 7 29 23 4 8 7 
Total 188 351 347 100 100 100 


204 Kenneth E. Clark and Robert L. Jones 
Table 3 
Opinion on Housing According to Income Level in 1950, 1952, and 1954 
Per Cent 

Per Cent Per Cent Qualified or 
Income Level N Favor Oppose No Opinion 
1950 1952 1954 7505-252. t54 50 ’52 ’54 50.52 254 2501292 254 
$5,000andup $5,500andup $6,000andup 62 108 109 44 47 49 35 34 26 21 19 24 
$3,000 t0 4,999 $3,500t05,499 $4,000to5,999 74 125 114 39 52 50 35 28 19 26 20 32 
Less than $3,000 Lessthan $3,500 Lessthan $4,000 45 89 101 35 40 49 43 27 21 22 33 30 


Table 3 presents a breakdown of response 
to the “core question” by persons of high, 
medium, and low income. Noteworthy in the 
1954 data was an evening-out of “favor” re- 
sponses over all income classes in contrast to 
the pattern in the previous studies. This 
evening-out represents a substantial increase 
in the “favor” response on the part of the 
lowest income group over the four years. 
“Oppose” responses have declined in all in- 
come classifications, but least sharply in the 
highest income group. “Oppose” responses 
among persons in the lowest income bracket 
have decreased more than half between 1950 
and 1954. A new trend appears in the 1952- 
1954 data of Table 3 for undecided respond- 
ents. The table shows a large increase from 
1952 to 1954 in the number of persons in the 
middle-income group who are undecided or 
have no opinion about the project. This re- 
verses the trend for this response from 1950 
to 1952. 

Tables 4, 5, and 6 present responses to 
three questions requiring respondents to “pre- 
dict” effects of the project. The data are pre- 
sented for the total group in each of the 
three studies and also according to “core 


question” breakdowns isolating respondents 
who favored the project, opposed it, or were 
undecided, or had no opinion about it. 

Table 4 data on the likely effect of the 
project on property values indicates that fear 
of adverse effects had declined very sharply 
and in 1954 was no longer considered of any 
substantial importance except by the minor- 
ity who opposed the project. Even within 
this latter group, less than half felt that the 
project would reduce property values. Most“ 
of the shift in response to this question was 
from a “property values will go down” re- 
sponse to a qualified or don’t-know response. 
Clearly, very few respondents believed the 
project would increase neighborhood property 
values. 

Table 5 indicates that the proportion of 
total respondents who in 1954 thought the 
project kad brought undesirable people into 
the neighborhood (a “yes” response) is very 
nearly the same as the proportion of total 
respondents who in 1952 expected that the 
project would do so. Both these figures, 
however, are well below the 1950 expecta- 
tions on this matter. It appears, then, that 
planning-stage fears concerning effect of the 


Table 4 
Do You Think Property Values Will Go Up, Down, or Stay the Same? 


Go Up Go Down Stay Same Other 
N (Per Cent) (Per Cent) (Per Cent) (Per Cent) 
1950 1952 1954 50 ’52 754 750 752 ’54 °50 °52 °54 750 52 754 
Total group 188 351 347 5 4 4 35 28 13 50 60 55 10 8 28 
Favor project 73 159 169 8 5 4 PETES MOT Cad 8 7 22 
Oppose project 71 108 77 EFAS NA 75 60 48 16 30 32 Cig oh akg 
No opinion or qualified 


opinion on project 44 84 101 2902 


3 18 19 4 61 68 45 19 11 48 


a 


Attitude Changes Toward a Low-Rent Housing Project 205 
Table 5 
Do You Think This Unit Will Bring Undesirable People Into Neighborhood? 

Yes No Other ` 

N (Per Cent) (Per Cent) (Per Cent) 
1950 1952 1954 750 752 «754 50 7°52 754 50 752 754 
Total group 188 351 347 41 28 30 38 45 4 21. 427.29. 
Favor project 73 159 169 15°12. 20: 73 67 «(56 12/0272 
Oppose project 71 108 77 76 58 71 10 18 7 14 24 «22 

No opinion or qualified 

opinion on project 44 84 101 30) 123. 10) 25 39 42 45 38 42 


project on the kinds of people who would be 
brought into the neighborhood were reduced 
over the 1950-54 span, but that no part of 
this change to a more favorable view took 
place after the residents moved in. 

Table 5 data, broken down by how people 
responded to the “core question,” show some 
interesting trends during the three stages of 
the housing project. It is seen that more 
than three fourths of the persons opposed to 
the project in 1950 believed that the project 
would bring undesirable people into the neigh- 
borhood. In 1952 the proportion was sub- 
stantially reduced, possibly as a consequence 
of the attractive physical appearance of the 
completed project. Then in 1954, after two 
years of occupancy of the project, the view 
of the opposed group returned to just about 
its initial pessimistic level. The pattern over 
the years for the group which favored the 
project is somewhat ambiguous. Among per- 
sons who were undecided or who had no opin- 
ion on the “core question,” however, there has 
been a steady trend toward belief that the 
project would not bring and had not brought 
undesirable people into the neighborhood. 

One result of the 1950 study which indi- 
cated the extent of opposition to the plans 


83 Evidence concerning reaction to the appearance 
of the project was obtained in 1954 from a question 
which asked, “Do you think in the long run this 
housing development will make the neighborhood 
look more attractive, look about the same, or look 
less attractive?” The more attractive response had 
a plurality and was selected twice as often as the 
less attractive alternative. Another question seek- 
ing reaction to the landscaping (lawns, trees, and 
shrubs) of the project was answered in the very 
good response category by 49 per cent of the sam- 
ple. Only 3 per cent said the landscaping looked 
“rather bad.” 


for the project was the rather sizable num- 
ber of persons who said that such a project 
would affect their long-term plans to remain 
as residents in the neighborhood. At that time 
over half of those opposed to the project and 
over a quarter of all respondents indicated 
that the project probably would influence 
them to leave the neighborhood. ‘Table 6 
shows data for all three periods of time on 
this question for the total sample and for the 
favor, the oppose, and the no-opinion—un- 
decided breakdowns on the “core question.” 

For the total group, for those who favored 
the project, and for those who had no opin- 
ion or who were undecided about it, the trend 
on the leave-the-neighborhood question is 
quite steady over the years. The project has 
decreased in importance as an influence on 
plans to stay in the neighborhood until only 
one person in 10 in the total sample in 1954 
regarded the project as a deterrent to his 
continued residence. In 1954 a majority 
even of those who opposed the project did 
not see it as influencing their continued resi- 
dence, although about a third of this group 
did say they have intentions to leave the 
neighborhood because of it. The extremely 
low “yes” percentages for the favor and for 
the undecided-no opinion groups indicate that 
the project is of almost no importance to these 
persons as a matter affecting continued neigh- 
borhood residence. 

Another insight into this area of concern 
was provided in 1954 by a supplementary 
question asking each respondent whether he 
knew of anyone who had moved out of the 
neighborhood because of the housing project. 
Fifteen per cent of the respondents said they 


206 Kenneth E. Clark and Robert L. Jones 
Table 6 
Will Construction of Development Have Effect on Your Long-Term Plans to 
Stay or Move Out of Neighborhood? 
Yes No Other 
N (Per Cent) (Per Cent) (Per Cent) 
1950 1952 1954 a50 SMe 250) 752,754 50°52 54 
Total group 188 351 347 28 12 10 64 77 85 Bi E nett: 
Favor project 73 159 169 11 1 1 86 96 97 Bib autre 
Oppose project Tie LOS LINE TT 5573339 28 47 51 17. :20,-40 
No opinion or qualified 

opinion on project ğ® 44 84 101 11 6 4 8 79 9 5, 15 3 


did know of such a case or cases. Breakdown 
analysis reveals that this 15 per cent is com- 
posed of twice as great a proportion of per- 
sons who opposed the project as it is of per- 
sons who favored or were undecided about it. 

Further evidence was obtained on the in- 
fluence of the project on continued neighbor- 
hood residence. Responses on the 1952 bal- 
lot for those families who actually left the 
neighborhood between 1952 and 1954 were 
analyzed. This analysis sheds light on the 
pre-moving attitudes of “actual” movers in- 
stead of basing attitude analysis on state- 
ments of intent to move. Data on the intent- 
to-leave question and the “core question” in 
1952 for those individuals who did leave the 
neighborhood between 1952 and 1954 indi- 
cate that there is no significant difference be- 
tween those who moved and those who stayed 
in. 1952 responses to the intent-to-leave ques- 
tion. Further, those who moved and those 
who stayed are not significantly differentiated 
by their “core question” responses. In other 
words, persons who said that the project 
would affect their long-term plans to stay in 


the neighborhood have remained in the neigh- 
borhood in equivalent proportion to those who 
did not see the project as an influence on con- 
tinued residence. It seems, then, that expres- 
sions of project influence on plans to stay in 
the neighborhood, at least over a two-year 
period, are not correlated with actual moving 
behavior. 

Certain information questions concerning 
the project were included in the 1950, 1952, 
and 1954 questionnaires. In the earlier bal- 
lots, some questions concerning the physical 
appearance of the units were asked. On the 
assumption that there would be almost com- 
plete familiarity with such matters by 1954, 
these questions were dropped in 1954. This 
reduction left just three information questions 
common to all three surveys. Table 7 shows 
the trend over the years in information about 
aspects of the project covered by these items. 
Question 1 asked, “About how many families 
do you understand will be housed in this de- 
velopment?” Question 2 asked, “What is the 
most money a family can make a year and 
still rent a place in the development?,” and 


Table 7 
Correct Responses to Information Questions in 1950, 1952, and 1954 


Question 1 Question 2 Question 3 

N (Per Cent) (Per Cent) (Per Cent) 
1950 1952 1954 50 752 54 50 752 754 50 752 754 
Total group: 188 351 347 30 23 10 27 34 8 24 39 47 
Favor project 73 159 169 rr Miey A spe sal 30 32 «10 47852" 156) 
Oppose project BS IIa 20 20S ey: Gaiam Y 24 26 39 

No opinion or qualified 

opinion on project 44 84 101 23 19 8 16 29 7 27-232 -39 


H 


Attitude Changes Toward 


Question 3 inquired, “If an undesirable family 
gets into this development will the housing 
authority be able to get them out?” On both 
Questions 1 and 2, which required fairly spe- 
cific information, there was a sharp reduction 
in-correct responses from 1952 to 1954, Less 
than one person in 10 knew the correct an- 
swer to these questions in 1954, whereas from 
a fourth to a third knew the answers in 1952. 
On Question 3 there was an increase in the 
number of persons knowing that the housing 
authority could evict undesirable families from 
the development. These data suggest that as 
the issue of the housing development passed 
from the planning stage to the building stage 


a Low-Rent Housing Project 207 
to the stage of an accomplished residential 
fact, the specific information level of neigh- 
borhood residents about the project declined 
sharply. This decline was proportionately 
great across the categories of favorableness- 
unfavorableness toward the project. 
Responses in each of the study years to a 
set of “appearance” and “project nuisance” 
items are shown in Table 8. No significant 
trend occurs over the years in response to 
a question about whether the project will 
make or has made the neighborhood more 
attractive physically. Two questions, one on 
whether shopping has been made more easy 
or more difficult and one on whether the proj- 


Table 8 


Responses to “Appearance” and “Project Nuisance” Questions in Three Stages of Project Reality 


1950 1952 1954 
(Planning Stage) (Construction Stage) (Occupancy Stage) 
Per Per Per 
Question and Response N Cent N Cent N Cent 
Will/Has the project make/made the 
neighborhood look more attractive? 
More attractive 68 36 150 43 141 41 
Less attractive 62 33 82 23 68 20 
About the same 43 23 101 29 99 28 
DK/no opinion 15 8 18 5 39 11 
Will/Has the project make/made 
shopping harder? 
Easier 11 6 17 5 1 = 
Harder 28 15 48 14 8 2 
No change 128 68 268 76 318 92 
DK/no opinion 21 11 18 5 20 6 
Will/Has project make/made neigh- 
borhood more pleasant place to live? 
More pleasant 26 14 34 10 16 5 
Less pleasant 70 37 82 23 68 19 
No change 75 40 197 56 226 65 
DK/no opinion 17 9 38 11 37 11 
Will/Has project make/made 
neighborhood more noisy? 
Yes 87 46 128 37 74 21 
No 86 46 204 58 241 70 
DK/no opinion 15 8 19 5 32 9 
Will/Has project create/created 
a traffic nuisance? 
Yes 55 29 102 29 41 12 
No 109 58 213 61 278 80 
No opinion 24 13 36 10 28 8 


208 


ect has made the neighborhood a more or less 
pleasant place to live, show sharp trends 
away from either a definite “more” or a 
“Jess” answer and toward an “unchanged” 
response. These data indicate that actual 
experience with the project as a neighbor- 
hood entity leads to a judgment by a vast 
majority of respondents that things are about 
the same as before. 

Two questions about physical nuisances 
(noise and traffic hazards) resulting from 
the project show strong and significant trends 
toward a “no nuisance” response in 1954 
after two years of exposure of neighborhood 
residents to any such nuisance which the 
project might cause. Experience with the 
project, then, has sharply lessened earlier ex- 
pectations concerning these physical nuisance 
matters. 


Identical Respondents 


The preceding analysis has described total- 
sample changes in neighborhood reaction to 
the housing development but has not shown 
important data on shifts in attitudes on the 
part of individuals. Although names of re- 
spondents were not taken in the interviews, 
accurate respondent matching was possible 
through responses on sex, age, education, oc- 
cupation, and length of residence in the dwell- 
ing unit. In addition, a number of 1954 re- 
spondents commented to field personnel that 
they had been interviewed in one of the previ- 
ous studies. 

Two sets of panel-type results are avail- 
able on individuals, one based on data from 
identical respondents who were interviewed 
in the 1950 and in the 1954 study, and an- 
other based on identical respondents inter- 
viewed in the 1952 and in the 1954 studies. 
The major analyses reported here are for the 
1950-1954 group of 49 cases who represent 
identical individuals interviewed at the earli- 
est and at the latest stages of inquiry con- 
cerning the project. 

Table 9 shows responses on the “core ques- 
tion” for this four-year panel group. 

A rather striking trend emerges from this 
panel table. It is clear that the greatest 
amount of shift is from an earlier “oppose” 
or “no-opinion” response to a more favorable 


Kenneth E. Clark and Robert L. Jones 


Table 9 


Comparison of Responses of Same Individuals on 
“Core Question” in 1950 and 1954 


1954 Response 


Qualified 
or No 
1950 Response Oppose Opinion Favor Total 
Favor 3 2 [13 | 18 
Qualified or a 
no opinion 1 | 7) 3 11 
Oppose [38] 6 6 20 
Total 12 15 22 49 


subsequent response. Fifteen persons shifted 
to a more favorable view while only five have 
shifted to a less favorable one. Nearly a 
third of all respondents who in 1950 or in 
1952 were opposed to the project swung all 
the way over to the “favor” response by 1954. 
Half of those who shifted from “oppose” to 
some other answer shifted all the way to 
“favor.” The remainder shifted, of course, 
to “no opinion” or “qualified.” This is in 
contrast to a similar analysis of identical re- 
spondents from 1950 to 1952 reported in an 
earlier article (2) where the gains in a fa- 
vorable direction for a same-individual panel 
group were very largely from no opinion to 
favor or from oppose to no opinion. 

Data from the 1952-54 identical-respond- 
ent cases corroborate the trends of Table 9. 
For this group of 119 cases, 35 shifted to a 
more favorable answer during the two years 
while only 16 shifted to a less favorable view. 

There is a suggestion in Table 9 that those 
respondents who in 1950 gave “no opinion” 
or “undecided” responses and who shifted to 
another answer in 1954 shifted mostly to “fa- 
vor.” This trend is strongly supported by 
the 1952-54 data where two-thirds of per- 
sons giving initial no opinion or qualified an- 
swers in 1952 changed their view in 1954 to 
“favor.” 

The greater stability of an initial “favor” 
response compared with an initial “oppose” 
response is indicated by the following propor- 
tions: Seventy-two per cent of identical per- 


Attitude Changes Toward a Low-Rent Housing Project 


sons who favored the project in 1950 favored 
it in 1954. This is supported by a 69 per 
cent figure for 1952-54 identical respondents 
who favored the project in 1952 again favor- 
ing the project in 1954. By comparison, 40 
pér cent of identical persons who opposed the 
project in 1950 opposed it in 1954. For 
1952-54 identical respondents, 55 per cent 
of those who opposed the project in 1952 op- 
posed it in 1954. 

Highlights of 1950-54 individual panel data 
analysis on other questions include such find- 
ings as: 


1. Less than a third of those persons who 
in 1950 expected that property values would 
go down as a result of the project responded 
that values actually had gone down by 1954. 
In fact, all persons in this panel group who 
said in 1954 that property values had gone 
up between 1950 and 1954 were in the group 
who believed in 1950 that the project would 
reduce property values. 

2. More than two-thirds of the same indi- 
viduals who said in 1950 that the project 
would make the neighborhood a less pleasant 
place in which to live shifted by 1954 to a 
more favorable position—largely to a re- 
sponse of “no change” in neighborhood pleas- 
antness. 

3. Decided shifts occurred on two physical 
nuisance items. Two-thirds of the individu- 
als who thought in 1950 that the project 
would make the neighborhood more noisy 
shifted to a more favorable response in 1954. 
More than half of those who thought the 
project would create a traffic nuisance shifted 
to a more favorable response in 1954. 

4. A shift in a favorable direction was 
noted on the question, “Are people in the 
housing units about the same as others in 
the neighborhood?” A 13 per cent marginal 
gain in the “Yes” response was picked up 
about equally from initial “No” and “Don’t- 
know” respondents. Very little shift from an 
initial “Yes” to a subsequent “No” response 
occurred. 

5. The item with the greatest amount of 
turnover was the one inquiring whether the 
project would make the neighborhood physi- 
cally more attractive. The marginal totals 
on this item showed a small gain in the di- 


209 


rection of a more attractive response. Inter- 
nally, however, the table showed a very great 
number of shifts. The marginal gain was 
due largely to about half of the persons who 
initially thought the neighborhood would look 
“about the same” shifting to a “more attrac- 
tive” response in 1954. Counterbalancing 
this, less than half of those who initially 
thought the project would make the neigh- 
borhood more attractive in 1950 still thought 
so in 1954, 


A further value of these panel data on 
same-individuals is the light they shed on the 
total sample results reported in the preceding 
section. If the trend in responses of the 
same individuals over the years turned out 
to be systematically different from “trends” 
in the cross-section response totals for the 
complete sample, then doubt would be raised 
concerning the accuracy of interpreting year- 
by-year differences in cross-section totals in 
terms of neighborhood “changes” in attitude. 
What might appear to be total-sample shifts 
toward more favorable views in the total 
sample over time could actually reflect, for 
example, the views of less involved and hence 
more neutral or more favorably disposed new 
residents in neighborhood fixed addresses. 

The marginal frequency proportions in 1954 
for the 1950-54 same-individuals are not sig- 
nificantly different, however, from remainder- 
of-sample responses on any of the attitude 
items on the ballot. Marginal data for same- 
individuals for 1954 from Table 9 give an 
example of this correspondence on the “core 
question.” The 1950-54 same-individual re- 
sponses on this question in 1954 are Favor— 
45 per cent, Oppose—22 per cent, and No 
Opinion/Undecided—33 per cent. The re- 
mainder-of-sample proportions are 49, 22, 
and 29 per cent respectively. These findings 
indicate that the contribution to the total 
sample attitude picture made by new resi- 
dents, noncorresponding cases, and the like 
is in accord with the pattern of attitude shifts 
discernible through the analysis of identical 
individuals interviewed over the years. This 
lends strength to the total sample results as 
indicative of trends in neighborhood reaction 
to the housing development. 


210 


Same Households, Different Respondents 


In 187 instances in 1954 an interview was 
obtained at a fixed address with a responsible 
adult who was a member of the family of the 
person interviewed at an earlier date, but not 
the previous interviewee himself. In 48 of 
these cases, the families were the same fami- 
lies interviewed in 1950, and in 139 cases the 
families were the same families interviewed 
in 1952. 

Results from these “family-member panels” 
on certain of the main questions in the ballot 
will be presented here. 

On the “core question,” the marginals in 
Tables 10 and 11 show that 1954 responses 
from same-household respondents are quite 
different from responses of the remainder of 
the sample and from responses of same-in- 
dividuals in showing no increase in the “fa- 
vor” response in 1954. Table 10, in fact, 
shows a slight decrease in the “favor” re- 
sponse. In contrast to the data in the previ- 
ous section on same-individuals, shifts from 
previous views in the same-family sample are 
almost equally distributed in more favorable 
and less favorable directions. A combination 
of data from Tables 10 and 11 indicates that 
49 same-family members who shifted from an 
earlier view expressed by an alternate adult 
respondent in the family shifted to a less fa- 
vorable view; 48 of the shifters moved to a 
more fayorable view. 

Relative stability of response on the “core 
question” for same-individual and same-house- 


Table 10 


Comparison of Responses of Same Family Respondents 
on “Core Question” in 1950 and 1954 


1954 Response 
Qualified 

or No 
1950 Response Oppose Opinion Favor Total 
Favor 4 4 22 
Qualified or fia 

no opinion Zin bai 4 8 

Oppose E9] 8 1 18 
Total 15 14 19 48 


Kenneth E. Clark and Robert L. Jones 


Table 11 


Comparison of Responses of Same-Family Respondents 
on “Core Question” in 1952 and 1954 


1954 Response 


Qualified 
or No 
1952 Response Oppose Opinion 


Favor Total 


Favor 8 mw [2| 97 
Qualified or 

no opinion 14 [12] 13 39 
Oppose 11 | 12 10 33 
Total 33 41 65 139 


hold respondents was analyzed to give some 
notion of the comparative firmness of view 
for these classes of respondents. Same-house- 
hold respondents gave only 48 per cent re- 
sponses on the “core question” identical with 
the earlier reply of an alternate household 
member. This is substantially lower than 
the corresponding percentage, 57, for same- 
individuals. This and the preceding data in 
this section suggests that a moderate amount 
of division of opinion exists within house- 
holds. 

Through the remainder of the question- 
naire items, the 1950-54 same-household re- 
spondents deviated further in marginal item 
totals from the remainder of the sample than 
did the same-individuals. A characteristic of 
same-family respondents in 1952-54 was the 
giving of substantially fewer “Don’t-know,” 
“No-opinion,” or “Qualified” answers to the 
various attitude items than nonpanel respond- 
ents. Same-family respondents tended to have 
much more definite views. The item which 
most characterized this tendency was the one 
inquiring whether property values had gone 
up, gone down, or stayed the same since the 
project was begun. Pooling all cases of same- 
family respondents shows that only about a 
sixth had no opinion on this item while more 
than a third of the rest-of-samples respond- 
ents, excluding same-individuals, had no opin- 
ion. This is reasonable in view of the greater 
opportunity of same-family respondents to be- 
come aware of property value trends over a 
time period. 


Attitude Changes Toward a Low-Rent Housing Project 


Summary 


In 1950, shortly after city government ap- 
proval was given to plans for a low-rent pub- 
lic housing project in an established residen- 
tial neighborhood of a large midwestern city, 
interviews were conducted with neighborhood 
residents to determine their opinions about 
the project. Information questions about the 
project also were included. A fixed-address 
sampling plan was employed, Results from 
this study were called “planning-stage” data 
and represented views of neighborhood resi- 
dents at an early stage of project reality. 
Opinions were not anchored to any physical 
and tangible neighborhood change but were 
held with respect to less tangible plans and 
prospects. 

In 1952, when construction of the project 
had been completed but before any tenants 
had moved in, a second study was completed, 
using the same set of questions and the same 
fixed addresses plus another set of fixed ad- 
dresses drawn to expand the sample. Results 
from this study were called “construction- 
stage” data and represented opinions an- 
chored to the physical reality of the finished 
housing, but not to the human and social re- 
ality of the presence of occupants. 

In 1954, after tenants had been occupying 
the project for two years, a third study was 
completed using many of the items from the 
earlier ballots and using the same fixed-ad- 
dress sample. Results from this study were 
called “occupancy-stage” data and represented 
opinions anchored both to the physical re- 
ality of the housing and to the human and 
social reality of the occupants as neighbors 
in the community area. 

Results showed that: 


1. For the total samples, and for nearly all 
questions, a definite trend toward more fa- 
vorable opinions toward the project and its 
occupants was discerned from the planning 
to the construction to the occupancy stages. 

2. On a “core question” asking directly 
about approval or disapproval of the project 
a steady increase in “favor” and a decrease 
in “oppose” responses was noted. This trend 
cut across all income classes, but was most 
pronounced for the lower income groups. 


211 


3. Economic-centered fears that thé project 
would lower property values were largely dis- 
pelled over the four-year period. Few re- 
spondents believed that their taxes had gone 
up because of the project. 

4. Responses to questions concerning the 
effect of the project on the attractiveness of 
the neighborhood and its pleasantness as a 
place to live show little trend to any definite 
view over the years. It seems that the physi- 
cal and social “anchors” for attitudes toward 
the housing development since 1950 have, if 
anything, affected responses to these questions 
in a direction of “no change.” 

5. Responses to person-centered questions 
inquiring whether the project has brought un- 
desirable people into the neighborhood and 
whether project tenants are like or unlike 
people in the rest of the neighborhood show 
slight tendencies toward more favorable views 
toward the tenants. There has been no sig- 
nificant change in these questions since ten- 
ants have moved into the project. 

6. Many fewer persons in the latest study 
say the project is affecting their long-term 
plans to remain as residents in the neighbor- 
hood. Analysis of the earlier responses of 
persons who actually moved out of the neigh- 
borhood during the period of occupancy of 
the project indicates no significant difference 
on this intent-to-remain-in-neighborhood ques- 
tion between this group of “movers” and the 
rest of the sample who stayed. * 

7. A group of physical nuisance items on 
neighborhood noise levels, traffic nuisances, 
school crowding, and shopping difficulty show 
strong shifts to more favorable responses dur- 
ing the period of actual occupancy of the 
project when an experience-based anchor for 
opinions was present. 

8. Two information questions about the 
project which required rather specific knowl- 
edge to answer correctly showed considerable 
decline in proportion of correct response from 
1950 and 1952 to 1954. One more general 
information question was answered correctly 
by more persons in 1954. These data sug- 
gest that as the project became less of a 
Planning-stage “issue” in the neighborhood 
and was instead an accomplished fact, de- 


212 


tailed information level about it among neigh- 
borhood residents declined. 

9. Panel-type analysis of the opinions of 
identical individuals who were respondents in 
1950 and again in 1954 demonstrated that 
such individuals shifted in the same fashion 
as indicated by total-sample results, For these 
people, there was much greater stability on a 
“favor” response on the “core question” than 
on “oppose.” Over the years many of those 
who shifted from “oppose” shifted all the 
way to “favor” on this question. Decided 
shifts in a favorable direction were noted on 
property-value expectations, on neighborhood 
pleasantness, and on physical nuisance items 
involving the project. Same-individual data 
from 1952 to 1954 corroborated most of these 
findings, 

10. Panel-type analysis of respondents from 
same families interviewed in previous studies, 
but not the same individuals previously inter- 
viewed, showed that data from these same- 
family respondents departed on several ques- 
tions from the total-sample trend and from 
same-individual patterns. This suggests that 
a goodly amount of within-household opinion 
variance on the housing project exists. 

11. Of incidental interest is that for the 
third time, 96 per cent of assigned fixed ad- 


Kenneth E. Clark and Robert L. Jones 


dress households yielded completed inter- 
views. 
was under 2 per cent, thus demonstrating the 
effectiveness of a highly-trained interviewing 
staff and a persistent call-back field plan. 


Received October 19, 1955. 


References 


1. Clark, K. E., & Swanson, C. E. Neighborhood 
reaction to public low-rent housing. J. appl. 
Psychol., 1951, 35, 342-347. 

. Clark, K. E., & Swanson, C. E. Attitudes toward 
public low-rent housing, before and after con- 
struction. J. appl. Psychol, 1953, 37, 201- 
206. 

. Festinger, Lọ, & Kelley, H. H. Changing atti- 
tudes through social contact. Ann Arbor: 
Research Center for Group Dynamics, Univer. 
of Michigan, 1951. 

4. Housing Authority of Baltimore City. A study 
of the attitudes of potential applicants toward 
public housing. Baltimore: Housing Author- 
ity of Baltimore City, 1954. 

. Merton, R. K., West, Patricia S., Jahoda, Marie, 
& Selvin, H. C. Social policy and social re- 
search in housing. J. soc. Issues, 1951, 7, 
Nos, 1-2. 

6. Wilner, D. M., Walkley, Rosabelle, & Cook, S. W. 
Residential proximity and inter-group rela- 
tions in public housing projects. J. soc. 
Issues, 1952, 8, 45-69. 


aa) 


wo 


n 


The number of refusals in all years ` 


The Journal of Applied Psychology 
Vol. 40, No. 4, 


Validity and Factor Analyses of Naval Air Training 
Predictor and Criterion Measures 


John T. Bair, Robert F. Lockman,* 
and Charles T. Martoccia * 


U. S. Naval School of Aviation Medicine 


World War II pilot candidate selection re- 
search resulted in test batteries that were ef- 
fective in predicting flight training success in 
the Navy and Air Force (5, 6). Some valid 
Air Force predictors, however, were not in- 
cluded in the Navy selection battery, particu- 
larly tests of spatial and perceptual abilities 
(7). 

A recent Navy-Air Force joint research 
project, the Pilot Candidate Selection Re- 
search Program (PCSRP), involved the ad- 
ministration of 69 experimental tests to a 
population of 2,126 Navy midshipmen about 
to enter flight training (11). Validity data 
from this project indicated that spatial and 
perceptual ability tests yielded some of the 
highest correlations with a pass-fail flight 
training criterion (12). On the basis of this 
evidence the Spatial Apperception Test was 
added to the Navy Flight Aptitude Rating 
selection battery (1). This test was pat- 
terned after the Air Force Aerial Orientation 
test which correlated .34 with graduation or 
elimination for academic reasons, and .31 with 
completion or elimination for flight proficiency 
reasons in the PCSRP study. Preliminary 
validity evidence warranted retention of the 
Navy test in the selection battery, but more 
extensive investigation of the relations of 
spatial and perceptual abilities with flight 
training criteria is needed. This was the ma- 
jor purpose of the present study. The spe- 
cific objectives were: (a) to investigate cer- 
tain spatial and perceptual tests together with 
other measures of differential abilities in re- 
lation to both academic and flight training 
grades in the Naval Air Training Program, 
(b) to determine more information on the fac- 
torial structure of these spatial and perceptual 

1Now at the Bureau of Naval Personnel, Wash- 
ington, D. C. 

2 Opinions or conclusions herein are those of the 


authors and do not necessarily reflect the views or 
possess the endorsement of the Navy Department. 


abilities, and (c) to provide data to aid con- 
struction of more reliable and valid measures 
of spatial, perceptual, and other differential 
abilities in relation to success in the Naval 
Air Training Program. 


Procedure 


A battery of seven standardized spatial and per- 
ceptual ability tests was administered to a group of 
125 naval aviation cadets in D Stage of basic flight 
training during the fall of 1952. This was the in- 
strument and radio instruction stage which in 1952 
followed primary flight training. To obtain cri- 
terion data, the sample was followed through to the 
completion of training. The sample consisted of 108 
cadets who completed training and were designated 
naval aviators. Seventeen cadets were excluded from 
the original sample because of attrition or incom- 
plete records. 

Scores on six other differential ability tests were 
included as variables. Four of these were adminis- 
tered during pre-flight school; the other two had 
been administered in the initial cadet selection bat- 
tery. 

Three pre-flight school academic grades and four 
flight-training grades were included as criterion vari- 
ables. A description of the predictor and criterion 
variables follows: 


Tests Administered During D Stage 


1. Revised Minnesota Paper Form Board (Series 
MA): requires the selection of an appropriate as- 
sembled two-dimensional geometric figure after men- 
tal manipulation of the unassembled parts. A reli- 
ability of .92 has been reported (13). 

2. DAT Space Relations (Form A): measures an 
ability to visualize a constructed object from a pat- 
tern and how the object would appear if rotated in 
various ways. An additional feature of this test is 
the mental manipulation of objects in three-dimen- 
sional space. A mean reliability of .93 has been re- 
ported (2). 

3. Guilford-Zimmerman Spatial Orientation (Form 
A): measures an ability to evaluate the spatial po- 
sition of objects with reference to the human body. 
It requires awareness of whether or not one object 
is to the right or left, higher or lower, and nearer or 
farther from another object. A reliability of .88 has 
been reported (9). 

4. DAT Clerical Speed and Accuracy (Form A): 
measures speed of response in simple perceptual tasks 


213 


214 


requiring the selection of a proper number or letter 
combination from a series of other combinations. A 
mean reliability of .87 has been reported (2). 

5. Minnesota Clerical-Number Comparison (Part 
1 of the Minnesota Clerical Test): requires quick and 
accurate comparisons of number combinations for 
similarities or differences. A reliability of .76 has. 
been reported (13). 

6. Minnesota Clerical-Name Comparison (Part 2 
of the Minnesota Clerical Test): requires the rapid 
and accurate comparison of proper name combina- 
tions for similarities or differences. A reliability of 
.83 has been reported (13). 

7. Topological Orientation Test: measures orienta- 
tion to specific geographic points. Using a compass 
rose as a reference and Chicago and Pensacola as 
points of origin, the examinee indicates the direction 
to 10 cities in the United States and 10 foreign cities 
by the shortest possible route without the aid of a 
map. The test score is an error score derived by 
summing an individual's deviations from the initial 
great circle headings for each city. A reliability of 
.86 has been reported (3). 


Tests Administered in the U. S. Naval School, Pre- 
Flight 


8. ACE Psychological Examination-L (1947 edi- 
tion): includes sentence completion items, artificial 
language, and vocabulary same-opposites. A reli- 
ability of .95 has been reported for this language 
section of the 1938 edition (13). 

9. ACE Psychological Examination-Q (1947 edi- 
tion): includes arithmetic reasoning, figure analogies, 
and number-series items. A reliability of .87 has 
been reported for this quantitative section of the 
1938 edition (13). 

10. GED Correctness and Effectiveness of Expres- 
sion (College Form A): measures the understanding 
of certain skills in English usage, particularly gram- 
mar and spelling, at the college freshman level. No 
reliabilities have been reported. 

11. Essentials of Mathematics: measures certain 
elementary mathematical skills. In Part I the items 
measure proficiency in addition, subtraction, multi- 
plication, and division of whole numbers, fractions, 
and decimals. Part II covers general reasoning prob- 
lems requiring high school algebra and geometry. 
This test was developed at the U. S. Naval School, 
Pre-Flight, and no reliabilities have been reported. 


Tests Administered as Part of the Cadet Selection 
Battery 


12, Aviation Classification Test (Forms 3 and 
4): 3 measures general academic intelligence and in- 
cludes sections on vocabulary; meter and dial read- 
ing; judgment; mathematics; number, name, and 
symbol comparisons. A reliability of .92 has been 
reported (5). 

13. Mechanical Comprehension Test (Forms 3 and 
4): requires comprehension of the nature, operation, 
and effects of various physical principles rather than 


8 This test has recently been replaced by a newer 
form called the Aviation Qualification Test (4). 


John T. Bair, Robert F. Lockman, and Charles T. Martoccia 


knowledge of specific tools or equipment. A reli- 


ability of .87 has been reported (5). 


Criterion Variables: U. S. Naval School, Pre-Flight 
Grades 


14. Final Navigation Grade: includes the average 
of weekly quizzes and a final examination for nine 
weeks of dead-reckoning navigation and five weeks 
of celestial navigation. The quizzes and the ex- 
amination each are weighted 50 per cent in the final 
grade, which is converted to standard score form 
(as are all other grades in pre-flight school). 

15. Final Engines Grade: includes the average of 
weekly quizzes and a final examination, each weighted 
50 per cent. The course involves basic understand- 
ing of aircraft engines and their operation. 

16. Ground School Final Grade: computed at the 
end of pre-flight school as a weighted average of 
final grades in Navigation, Naval Orientation, En- 
gines, Aerology, Principles of Flight, Physical Train- 
ing, Military, and Study Skills. This grade is com- 
puted at the end of pre-flight school. 


Criterion Variables: Flight Training Grades 


17. Final K Stage (Basic) Grade: includes 10 in- 
structional and two check flights in field carrier 
landing practice. On each flight, the student is given 
a mark by his instructor of AA (above average), 
A (average), BA (below average), or U (unsatis- 
factory) on each of several maneuvers and pro- 
cedures. Letter grades are accumulated for the 12 
flights. Then a numerical grade is determined by 
multiplying each AA by four, each A by three, each 
BA by two, and each U by one. These weighted 
scores are summed and divided by the total number 
of letter grades, and converted to standard score 
form to obtain the final K stage grade. 

18. Final L Stage (Basic) Grade: includes six air- 
craft-carrier landings and is determined in a manner 
similar to K stage described above. 

19. Final Basic Flight Grade: includes all AA, A, 
BA, and U letter grades accumulated for 104 in- 
structional and check flights covering the 11 stages 
of basic flight training. These stages (in addition 
to K and L described above) were A—primary solo, 
B—precision, C—acrobatics, D—instruments and 
radio, E—night, F—formation, G—gunnery, H— 
primary combat, and I—cross-country navigation. 
The accumulated letter grades for these stages were 
weighted, averaged, and converted to standard score 
form in the manner described above. 

20. Final Advanced Flight Grade: includes all ad- 
vanced flight training grades which are computed in 
same way as final basic flight grades. Advanced 
training stages are, in general, extensions of basic 
training stages, although varying with the type of 
aircraft in which the cadet specializes. 

Means, standard deviations, and product-moment 
correlation coefficients were computed for these 20 
variables. Table 1 presents the means and standard 


4A cadet is given extra flights if a regular flight is 
graded incomplete or unsatisfactory. These extra 
flights, however, are not included in the final grade. 


Naval Air Training Predictor and Criterion Measures 


Table 1 


Means and Standard Deviations of Predictor 
and Criterion Variables * 


Predictor Mean SD 

1. MPFB 51.31 7.21 

2. DAT-SR 75.38 13.86 

3. GZ-SO 31.59 10.30 

4. DAT-CS&A 63.10 11.21 

5. Minn. Clerical, Numbers 117.43 23.18 

6. Minn. Clerical, Names 123.44 27.98 

7. TO 85.57 26.64 

8. ACE-L 49.72 11.28 

9. ACE-Q 44.88 10.21 

10. English 48.60 10.88 
11, Math 40.94 13.87 
12, ACT 85.77 10.75 
13. MCT 60.10 6.51 
Criterion Mean SD 

14, Navigation grade (PF) 49.49 7.16 
15. Engines grade (PF) 49.39 7.78 
16, Pre-flight ground final grade 50.36 5.63 
17. K-Stage grade (Basic) 2.95 .08 
18. L-Stage grade (Basic) 2.87 21 
19. Final basic flight grade 2.99 07 
20. Final advanced flight grade 3.05 06 


* The predictor and criterion variables are listed in the same 
order as in the body of the text. 


deviations and Table 2 the correlation (and residual) 
matrix. 


Analyses and Results 
Validity Data 


The variables in Table 2, the correlation 
matrix, can be classified into two general 
categories: (a) training grades or proficiency 
criteria, and (b) psychological measures of 
aptitudes and abilities potentially predictive 
of training grades.” No significant correla- 
tions were found with K or L Stage grades. 
All other criterion variables correlated signifi- 
cantly with three or more predictor variables. 
` The classification of variables as predictors 
or criteria facilitates analysis for the “best” 
combinations of multiple validity. Jenkins’ 
improved short-cut method for multiple cor- 
relation (10) was used for this purpose, and 
the results are given in Table 3. For each 
criterion, the multiple correlation coefficient 
-with MCT and ACT scores is also presented 
to contrast the validity of these standard 


5 Significant validity coefficients are shown in bold- 
face type in Table 2. 


215 


cadet selection tests with that of the best 
combination of all predictor measures. 

The batteries of experimental predictor 
measures so derived resulted in significantly 
greater multiple validities for all grade cri- 
teria, except final basic flight grade, than 
those obtained with ACT and MCT. Seven 
predictors in varying numbers and combina- 
tions accounted for the multiple correlations 
achieved: Essentials of Mathematics, GED 
Correctness and Effectiveness of Expression, 
Minnesota Paper Form Board, Navy MCT, 
Minnesota Clerical (Names and Numbers), 
ACE-L, and Topographical Orientation. 

It is interesting to note that none of the 
predictor variables related significantly to K 
and L Stage grades and the correlation be- 
tween these two stage grades was .28. This 
seems to be an unexpectedly low relationship 
for grades given the same individuals in two 
successive training stages presumed to be 
highly related, that is, field carrier landing 
practice and actual carrier landings. 


Factor Analysis 


Four significant factors were extracted from 
the original correlation matrix using Thur- 
stone’s centroid method (14). The unrotated 
factor matrix is presented in Table 4. The 
residuals after extraction of the fourth factor 
are included in Table 2. There were but two 
significant residuals remaining at this point, 
meeting Guilford’s criterion for a stopping 
place in factoring (8). 

The four factors were rotated orthogo- 
nally® into a satisfactory approximation of 
a simple structure using Zimmerman’s graphi- 
cal method (15). Factor loadings after four 
rotations are given in Table 5. A loading of 
40 or greater is considered significant. All 
four factors were overdetermined by Thur- 
stone’s criterion (14). 

It is noted that criterion variables were in- 
cluded in the factorization. They have face 
validity in that they represent extensions of 
the predictor variables into everyday training 
situations. In addition, they aid in factor 
interpretation. 

Variable 5 has the highest loading on Fac- 

ê The authors are indebted to LTJG H. Paul Kel- 
ley, MSC, USNR for an oblique solution which re- 


sulted in a verification of the factors derived or- 
thogonally. 


Naval Air Training Predictor and Criterion Measures 217 
Table 3 
Multiple Validity Data * 
Predictors 
Grade Criterion R 2 3 4 

Navigation 72 Math English MC-Nos. = 

.56 ACT MCT — — 
Engines 65 MCT MC-Names Math TO 

46 MCT ACT — — 
Pre-flight ground final 81 Math ACE-L MCT MC-Names 

67 ACT MCT - = 
Final basic flight 30 MPFB MCT — — 

28 MCT ACT = = 
Final advanced flight 38 English MPFB TO — 

.24 MCT ACT — = 


* In order of contribution to explained variance, 


tor I and requires the accurate and rapid per- 
ception of similarities and differences between 
two sets of numbers. Variable 4 has the next 
highest loading and involves speed and ac- 
curacy in comparing number and letter com- 
binations; Variable 1 requires speed of per- 
ception of two-dimensional spatial objects and 
their mental manipulation. Variable 13 re- 
quires an ability to perceive the operation of 


common mechanical tools and items and to 
determine various physical principles from 
these operations. Factor I, then, can be de- 
scribed as perceptual analysis with speed of 
visualization of numbers, letters, and two di- 
mensional objects playing a major role and 
verbal comprehension assuming a negligible 
role. This factor is unique in that there 
were no significant loadings on it for any of 


Table 4 
The Unrotated Factor Matrix * 


Variable Number and 


Description I Ir TI IV Communality 
1, Minn, Paper Form Board 45 —33 —22 21 40 
2. DAT Space Relations 52 —07 —07 40 44 
3. G-Z Spatial Orientation 57 —28 —=15 30 52 
4. DAT Clerical 32 —37 —22 —34 40 
5. Minn. Clerical—Numbers 52 —21 —50 —31 67 
6. Minn. Clerical—Names 60 09 41 =29 63 
7. Topographical Orientation —42 —17 —16 —18 27 
8. ACE Psych. Exam.—L 64 40 20 07 62 
9. ACE Psych. Exam.—Q 65 09 27 -07 50 
10. GED English 60 34 11 27 55 
11. Mathematics 69 25 —09 15 56 
12. Aviation Classification 66 42 19 —08 65 
13. Mechanical Comprehension 47 11 —27 —44 50 
14. Navigation PF 81 12 -17 22 75 
15. Engines PF 53 40 —34 —08 57 
16. Pre-flight ground final 84 39 —23 11 93 
17. K Stage (Basic) 10 —42 41 20 39 
18. L Stage (Basic) 02 —28 14 14 12 
19. Final basic flight 29 —43 50 —08 52 
20. Final advanced flight 38 —18 17 -11 21 


* Decimal points omitted. 


218 John T. Bair, Robert F. Lockman, and Charles T. Martoccia 


Table 5 


The Rotated Factor Matrix * 


Academic Comprehension Applied 
Perceptual Potential of Relationships Spatial 
Variable Number and : 
Description Factor I Factor IT Factor II Factor IV Communality 
1. Minn. Paper Form Board 60 13 10 21 43 
2. DAT Space Relations 06 48 10 44 44 
3. G-Z Spatial Orientation 28 40 07 53 52 
4, DAT Clerical 64 —03 06 lars 43 
5. Minn. Clerical—Numbers 17 32 —01 03 69 
6. Minn. Clerical—Names 23 12 76 06 65 
7. Topographical Orientation 10 —36 —32 —15 26 
8, ACE Psych. Exam.—L —03 54 58 02 63 
9, ACE Psych. Exam.—Q 17 31 61 16 52 
10. GED English —10 60 41 15 56 
11, Mathematics 14 65 34 11 57 
12. Aviation Classification 07 51 64 —09 68 
13. Mechanical Comprehension 57 29 25 —25 53 
14. Navigation 25 73 30 29 77 í 
15. Engines 28 65 20 —22 58 
16. Final ground school 25 86 37 02 94 
17. K Stage —04 —25 22 54 40 
18. L Stage 00 —14 02 32 12 
19. Final basic flight 17 —29 48 45 54 
20. Final advanced flight 26 a 02 ~ 35 21 23 
Factor Intercorrelations 
(Rank-Order Method) ae N 
THI .00 
I-I —.34 
LIv —12 
T-I 30 
ILIV =21 
IIV —.09 


* After four rotations final communality values differ slightly from the unrotated values due to the graphical method of rota- i 
tion and rounding errors. Again, decimal points have been omitted. 


the criterion variables. It accounts for 11 
per cent of the total variance. 

Variables 16 and 14 haye the highest load- 
ings on Factor II. These are academic train- 
ing grades: final pre-flight school grade and 
final pre-flight navigation grade, respectively. 
Variables 11, 10, 8, and 12 deal with aca- 
demic ability involving the diagnosis of prob- 
lems and the development of rules and prin- 
ciples from a set of objects. Variable 15 is 
another academic training grade, final pre- 
flight engines grade. The remaining vari- 
ables, 2 and 3, require an ability to educe 
spatial orientation concepts useful in tech- 
nical academic courses such as navigation. 
Factor II can be identified as an academic 
potential factor particularly applicable to 
technical academic work. It accounts for 20 
per cent of the total variance. 


Variable 6 has the highest loading on Fac- | 
tor III: It involves rapid and accurate dis- 
crimination between two sets of proper names. 
Variables 12, 9, and 8 have the next highest 
loadings, and all require the comprehension of | 
concepts and their application to new situa- 
tions. Variable 19 is the total basic flight 
training grade and could be considered the ap- 
plication of principles and procedures learned ` 
in ground school and in flight training. Vari- 
able 10 measures the understanding of cor- 
rect language usage rather than factual knowl- 
edge. Factor III could be described as 
comprehension of relationships, particularly 
as related to the understanding of oral and 
written instructions and the application of 
rules and principles to actual flight situa- 
tions. Whereas Factor II seems to involve 
more of an inductive reasoning process, Fac- 


tor III seems to be primarily deductive in 
nature. Factor III accounts for 14 per cent 
of the total variance. 


‘for 7 per cent of the total variance. 
Although Table 5 indicates fairly low fac- 
tor intercorrelations, there is a logical de- 
pendence among three of them. Factor II 
»includes the development and learning of 
principles required for flight training essen- 
tially on a didactic level. Factor III in- 
cludes the ability to apply these principles to 
over-all flight training in a general manner, 
and Factor IV involves the discrete ability to 
apply necessary concepts of spatial relations 
to specific flying situations. 


Summary and Conclusions 


The chief purpose of this investigation was 
to interrelate certain tests of spatial and per- 
ceptual abilities, tests of other differential 
abilities, and proficiency measures in the 
‘Naval Air Training Program. It was found 
that: 


i 1. The most substantial relationships ex- 
isted between tests of academic aptitude and 
grades in the pre-flight phase of training. 

2. Tests of spatial and perceptual abilities 
? correlated highest with final basic and ad- 
vanced flight grades. 

3. Four significant factors derived by fac- 
tor analysis were: perceptual, academic po- 
tential, comprehension of relationships, and 
applied spatial relations. 

4. Although the inclusion of criterion vari- 
ables did not reveal any new factors, it did 
aid considerably in defining those factors 
found. 

5. Since only 51 per cent of the total vari- 


Naval Air Training Predictor and Criterion Measures 


219 


ance was accounted for by the four factors 
described, there may well be other factors 
that would account for some of the variables 
employed. It is also possible that some of 
these variables would cluster with factors 
still unidentified. 


Received September 22, 1955, 


References 


1. Ambler, Rosalie K. Preliminary evaluation of 
two forms of the Spatial Apperception Test. 
U. S. Naval Sch. Aviat. Med., 1953, Project 
NM 001 057.04.04. 

2. Bennett, G. K., Seashore, H. G., & Wesman, 
A. G. Manual for the Differential Aptitude 
Tests. New York: Psychological Corp., 1952, 

3. Clark, W. B., & Malone, R. D. The relationship 
of topographical orientation to other psycho- 
logical factors in naval aviation cadets. U: S. 
Naval Sch. Aviat. Med., 1952, Project NM 001 
05901.3200- vate. 

4. Davis, F. B. Development of the Aviation 
Qualification Test (Forms 5 and 6). 1953. 
(Contract Nonr 758[008].) 

5. Fiske, D. W. Validation of naval aviation cadet 
selection tests. J. appl. Psychol., 1947, 31, 
601-614. 

6. Flanagan, J. C., et al. (Eds.) AAF Aviation 
Psychology Program. Washington: U. S. Gov- 
ernment Printing Offce, 1947. (AAF Aviat. 
Psychol. Program Res. Rep. Nos. 1-19.) 

7. Guilford, J. P. (Ed.) Printed classification 
tests. Washington: U, S. Government Print- 
ing Office, 1947. (AAF Aviat. Psychol. Pro- 
gram Res. Rep. No. 5.) 

8. Guilford, J. P. Psychometric methods, 
York: McGraw-Hill, 1936, 

9. Guilford, J. P., & Zimmerman, W. S. A manual 
for the Guilford-Zimmerman Aptitude Sur- 
vey. Beverly Hills, Calif.: Sheridan Supply 
Co., 1947. 

10. Jenkins, W. L. An improved method for multi- 
ple R. Educ. psychol. Measmt, Summer, 1952, 
12, 316-322. 

11. Page, H. E. The pilot candidate selection re- 
search program: historical background and 
organization. USAF Sch. Aviat. Med, Proj. 
Rep., 1950, Proj. No. 21-29-008. (Rep. No. 
NM 001 057.04.01.) 

12. Payne, R. B., Rohles, F, H., & Cobb, B. B. The 
pilot candidate selection research program: 
test validation and intercorrelations, USAF 
Sch. Aviat. Med, Proj. Rep., 1952, Proj. No. 
21-29-008; BuMed Project No. NM 001 057. 

13. Super, D. E. Appraising vocational fitness. New 
York: Harper, 1949, 

14. Thurstone, L. L. Multiple-factor analysis, Chi- 
cago: Univer. of Chicago Press, 1947, 

15. Zimmerman, W. S. A simple graphical method 
for orthogonal rotation of axes, Psycho- 
metrika, 1947, 11, 51-55, 


New 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Dimensional Analysis of Motion: X. Experimental 
Evaluation of a Time-Study Problem + 


Donald Hecker, Donovan Green, and Karl U. Smith 


The University of Wisconsin 


Time-study technique represents a method 
of measurement of human performance in the 
industrial task from which quantitative stand- 
ards of work output may be derived. In the 
direct application of time-study technique, the 
engineer breaks up a task into its elements, 
called therbligs, and times these elements 
separately. Standards of output are derived 
from these measures by eliminating or cor- 
recting for those elements and conditions of 
the task that do not represent the “critical” 
factors and conditions defining the actual mo- 
tion generally to be performed. 

The most widespread use of time-study 
technique involves the direct method just de- 
scribed. Increasingly, however, industry is 
making use of predetermined time standards, 
i.e., a set of values derived from timing a re- 
stricted area of work, which are generally ap- 
plied to other industrial tasks. The use of 
such restricted time measurements and stand- 
ards has stimulated interest in the scientific 
validity of all time-study concepts and prac- 
tices, 

The scientific investigation of time-and- 
motion study may take several directions. 
One approach has involved the study of the 
accuracy of the rating methods used in cor- 
recting or leveling obtained times in order to 
establish a standard of performance (1, 3). 
Another line of attack involves evaluation of 
the methods of measurement and the validity 
of their application. 

Several studies (2, 4, 5, 6, 7, 8) have been 
conducted thus far which show that there are 
grave limitations in the concept of independ- 
ent movement elements, to which a standard 
of fixed time can be assigned as a result of 
measurement. The work thus far conducted 
along this line has involved observation of 
the differential effects of learning, of fatigue, 

1 This research has been supported by funds from 


the Graduate School Research Committee, the Uni- 
versity of Wisconsin. 


220 


of perception, and of the distance of move~ 
ment upon the separate therbligs making up” 
a motion. a 

The present study attempts to investigate 
directly a basic problem of present time- 
study methods and of the application of pre- 
determined time standards in industry. To 
what extent are the component movements or 
therbligs in motion interdependent? Will the 
changing of the nature of one movement affect 
the duration of an adjacent motion in the 
task? Specifically, as this study deals with 
the problem, will different types of manipula- 
tive movements bring about a variation in the 
duration of a travel movement of a fixed 
length? 

There are only very limited prior observa- 
tions on the problems of interaction of com- 
ponent movements in motion. Limitations in 
methods of motion analysis hitherto used 
have restricted investigation of such prob- 
lems. One of the main aspects of the present 
research has been to overcome such limita- 
tions in experimental methodology by de- 
velopment and application of special elec- 
tronic techniques of motion study. 


Methods 


As just noted, special electronic methods of mo- 
tion analysis have been devised to conduct this in- 
vestigation. In addition to these methods, a pre- 
planned work situation is used to control and vary 
the perceptual and reactive characteristics of the ~ 
performance task. 


Apparatus 


Figure 1 presents a photograph of the apparatus 
used in this experiment. The work panel, contain- 
ing in this case two vertical rows of four push but- 
tons each, is shown to the right in the figure. The 
electronic motion analyzer is housed in the small“ 
switching unit, shown to the far left. The two 
electronic interval timers located on the back of 
this table are used to measure separately the travel 
and manipulation aspects of motion. p 
y During an experimental observation on this par- 
ticular setup, the subject operator (S) stands before 


Dimensional Analysis of Motion: X. 


ae 
F 


Fic. 1. 


221 


P 


Experimental set-up showing the work panel to the right with the two ma- 


nipulation boards in place. The motion analyzer and electronic interval timers are shown 


on the table to the back. 


the large work panel and, on instruction, pushes 
with his preferred hand the push-button switches 
located on the work panel. He starts with the top 
switch on the left, crosses to the top one on the 
right, back to the second one on the left, and so on 
until all switches are pushed. When S makes con- 
tact with the first switch, he automatically activates 
one of the precision interval timers, the one used 
to measure the duration of manipulation. As long as 
he stays in contact with this first switch, this ma- 
nipulation timer runs. As soon as § breaks contact 
with the first switch, to move to the second, the 
manipulation clock stops and the second, the travel 
movement timer, starts to run. This timer continues 
until the second push button is touched, when it 
stops and the manipulation timer is turned on for a 
second time. As S presses the buttons successively, 
the two clocks automatically record the duration of 
each manipulation and travel movement in the task. 
‘When the last push button switch is operated, the 
two clocks stop automatically. 

The electronic motion analyzer can be thought of 
-as consisting of two circuits, Sm and St. One of 
these circuits is an open circuit, Sm, which is closed 
by S when he touches any of the knobs or switches 
on the work panel. The other circuit, St, is closed 
when: this first circuit is open. 

Figure 2 illustrates diagrammatically the circuits of 
the motion analyzer. The S is shown as the circle 
-at the bottom of the diagram. The dotted lines rep- 
resent the sweep of his arm from the manipulation 
board on the left to the manipulation board on 
the right. Four identical manipulation devices are 
mounted on each of the removable boards on the 
panel. Each of these devices is connected in com- 
mon to the left side of a double switching circuit, 
Sm. A metallic rod which the operator holds in his 
nonpreferred hand is also connected with this side 
of the switching relay, When the operator manipu- 
lates a knob on the work panel, the subject circuit, 


Sm, is closed and the second circuit, St, automati- 
cally opens. This closing of the subject circuit starts 
the precision timer, Mt, which measures the time of 
contact or manipulation time. When S breaks con- 
tact with the first knob and travels toward the next, 
the subject circuit is reopened, and the other circuit, 
St, is closed, activating the second timer, Tt. On 
successive manipulation and travel movements, the 
durations of these component movements are ac- 
cumulated on the timers. Touching the lower knob 
or switch on the right stops all recording. 

Several features of this high-precision electronic 
motion analyzer should be noted. The S is not 
stimulated by the current passing through his body 


Fic. 2. Diagram of the circuit relations involved in 
the electronic method of motion analysis. 


222 


inasmuch as it is at subthreshold level. Only 
vacuum-tube relays are used in this special design 
of the analyzer circuit. Accordingly its precision is 
limited only by the rapidity of the emission charac- 
teristics of the tubes. Finally, electronic interval 
timers are employed that provide time registration 
to an accuracy approaching 1/100,000th of a second. 
Because of calibration differences between interval 
timers, we estimate that readings to an absolute ac- 
curacy of .001 seconds are obtained. 


Procedure 


Eight different types of manipulation were studied 
in this experiment: (a) a clockwise turning of the 
hand, used to operate a knob-type turn switch; (b) 
a vertical switching movement, used to throw a 
toggle switch downward; (c) a pushing movement 
of the thumb, needed to press the push button 
switches shown in Fig. 1; (d) a pulling movement 
on a small latch device, carried out by grasping the 
latch between thumb and forefinger; (e) a dial- 
setting motion, requiring rotation of a 2-inch dial 
arm on a marked dial face; (f) a thumb-forefinger 
pressure movement, made by squeezing a latch de- 
vice; (g) a lateral switching motion to the right 
hand, using a toggle-type switch; and (4) a counter- 
clockwise turn movement, using the same type of 
turn switch as in a, 

The different devices used to obtain these eight 
different kinds of manipulations were mounted on 
boards as already described in connection with Fig. 
1. These manipulation boards could be quickly 
mounted on and removed from the work panel so 
that the type of manipulation performed could be 
changed from trial to trial in an experimental ses- 
sion. All the latches, switches, and knobs used in 
the experiment could be moved easily, involving ma- 
nipulation movements of 0.25 to .5 inches, The hori- 
zontal distances between switches or knobs on the 
two sides of the panel were kept constant at 24 
inches, 

This study was divided into two separate experi- 
ments, the first consisting of observations based on 
the first four of the eight types of manipulations 
listed above, and the second involving observations 
based on the last four. The procedure for each of 
the two experiments was identical. Male and female 
college students from elementary and intermediate 
courses in psychology were used as subjects. In per- 
forming a given type of manipulation, S stood be- 
fore the work panel Operating each of the identical 
manipulation devices on the two sides of the panel 
alternately, beginning with the uppermost device on 
the left board and ending with the lowest on the 
right board. A total of eight manipulation and 
seven travel movements occurred, therefore, in any 
single trial. Since touching the last switch stops the 
clocks, only seven manipulation movements were re- 
corded, 

In a given experimental period Ss were given three 
trials on each of the four types of manipulation 
used. Each experiment was run for four successive 
days. The four conditions of manipulation in each 


Donald Hecker, Donovan Green, and Karl U. Smith 


experiment, which are the main experimental vari- _ 
ables, give 24 possible sequences of observation. In 
order to eliminate sequence effects, 24 Ss were as- 
signed randomly to the 24 possible sequences. Ac- 
cordingly, all possible sequences of the four condi- 
tions of manipulation were used in each experiment, 
and one S was assigned to each sequence. On each 
day, then, each S$ was given three consecutive trials 
on each condition of manipulation in turn, giving 12 
trials in all. 

The recorded data of this study are the separate 
manipulation and travel times for each trial. The 
median scores for each component movement, based 
on the three trials for each condition of manipula- 
tion, were used in the analysis of the data, Separate 
analyses of variance were carried out for the travel- 
time scores in the two experiments. 


Results ? 


The main results of this study deal with 
the question, will the variation in the type of 
manipulation produce significant differences in 
the duration of a travel movement of con- 
stant length? The findings will be discussed 
in terms of two separate experiments, each 
dealing with four different types of manipu- 
lation. For each experiment, data will be 
presented separately for levels of skilled per- 
formance and for performance during learn- 
ing. In addition, data concerning the meas- 
ures of manipulation movements themselves 
will be mentioned. 


Interaction of Component Movements in 
Skilled Performance 


Figure 3 summarizes the differences in the i 
duration of a travel movement of the arm, 
24 inches in length, that occurred in relation 
to the eight different types of manipulation 
used in the experiments. Figure 3A gives 
the data for the first experiment and 3B for 
the second. The bar graphs represent the 
mean duration of the travel movement for 
the different types of manipulation on the 
last day of training, i.e., Day 4. ` 

Figure 3 shows that, during skilled per- 
formance, very marked differences occur in a 


? The measures on which the statistical analyses of 
this experiment are based, along with pertinent sum- 
mary tables not included in the presentation of the 
results, have been deposited with the American Docu- 
mentation Institute. Order Document No. 4810 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $1.25 for microfilm, or 
$1.25 for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 


Dimensional Analysis of Motion: X. 


26 
25 
24 
gh 2a 
Q 
z 
o 22 
o 
wm 
fia 21 
2.0 
TURN TOGGLE ¿PULL PUSH) 
w 
3st 
3.2 
Fae IE 
a 
z 
NE ore, 
o 
w 
Pien 
2.0 
PRESSING PRESSING TURNING DIAL 
I DOWN _TO-GETHER LEFT | SETTING 
` B 
Fic. 3. Differences in the duration of a travel 


movement with different types of manipulation. 
Figure A gives the data for the first experiment and 
Fig. B that for the second. 


travel movement of fixed length when this 
movement is associated with different types 
of manipulation. In the first experiment sig- 
nificant differences are found in the travel 
movement for all types of manipulation ex- 
cept the pull movement and pressing the push 
button. The bracket in Fig. 3A indicates 
the fact that the differences between these 
two forms of manipulation, as evaluated in 
terms of the Duncan Range Test, is not sig- 
nificant. Significant differences also occur in 
the travel movement under the four condi- 


tions of manipulation in the second experi- 


ment. The dial-setting manipulation pro- 
duces a travel movement markedly different 


223 


from that observed in the other three condi- 
tions. In terms of the Duncan Range Test, 
this difference is significant at the .01 level. 

The percentage differences in a travel move- 
ment related to different conditions of ma- 
nipulation may be described as follows. In 
the first study, if we use the knob-turning 
manipulation conditions as giving a travel- 
movement duration representing a 100 per 
cent base, then the other three conditions 
are, in order, 112, 122, and 128 per cent of 
that base time. In this first study, the travel 
movement related to the push-button ma- 
nipulation was of longest duration. In the 
second study, if we let the duration of the 
travel movement associated with the down- 
ward switching manipulation represent a 100 


PUSH 


N 2: eon PULL 
S eal 
o 
2 23) 
oO 22) 
21 
20) 
! 2 3 4 
A 
3.9 
3.7 
3.5 
DIAL SETTING 
o 3.3 
Q 2.7 
zZ 
O 2.5 
o „g TURNING LEFT 
w 23 Re eT 4 PRESSING TO-GETHER 
n ~H PRESSING DOWN 


Fic. 4. Learning curves for a travel movement of 
fixed length under different conditions of manipula- 
tion. The top curves (A) are for the first experi- 
ment and the bottom curves (B) for the second. 
Note the difference in the scale for the two curves. 


224 


per cent base, the duration of the travel 
movement in the other three conditions are, 
in order, 104, 104, and 152 per cent of this 
base time. The manipulation conditions giv- 
ing the biggest difference in the travel move- 
ment in relation to all other conditions is the 
dial-setting task. 


Interaction of Travel and Manipulation in 
Learning 


Figure 4 presents learning curves for the 
travel movement observed in this study. 
These curves are based on four days of prac- 
tice. Separate curves are drawn for the eight 
different conditions of manipulation used in 
the study. Figure 4A gives the curves ob- 
tained in the first experiment and 4B those 
obtained in the second. The time scales of 
the two sets of curves are different because 
of the marked differences in the level of the 
functions in the second experiment. 

There are two major points to be noted 
about the two sets of curves of Fig. 4. The 
first is that marked learning effects occur in 
the travel motion for some conditions of ma- 
nipulation and not for others. The right- 
turn manipulation and the downward switch- 


Donald Hecker, Donovan Green, and Karl U. Smith 


ing motion give travel movements that change 
hardly at all throughout practice. The left- 
turn, squeeze, upward-switching (Toggle), 
and pulling manipulations are associated with 
travel movements that show slight changes 
due to learning. The push-button and dial- 
setting manipulations give travel motions that 
show marked learning effects. Thus, the de- 
gree to which practice will change the dura- 
tion of a travel motion of constant length will 
depend upon the type of manipulation in- 
volved. 

Separate analyses of variance were carried 
out for the measurements of travel move- 
ments in the two different experiments. These 
analyses are summarized in Table 1. In both 
experiments, the critical variable, conditions 
of manipulation, is significant at the .01 level. 
Results similar to these general findings have 
already been discussed for the differences on 
Day 4 of each experiment, as shown graphi- 
cally in Fig. 3. In the general analyses of 
Table 1, it is also observed that the variable 
“Days” turns out to be significant, as well 
as the Condition X Days interaction. These 
findings serve to give some additional mean- 
ing to the learning curves discussed in rela- 


Table 1 
Summary of the Analysis of Variance of the Travel-Time Data 
Experiment 2 
A Mean 
Source af ss Square F 
Days 3 4.199 1.399 11.476** 
Subjects 23 22.776 -990 52.732** 
Conditions 3 108.241 36.080 203.073** 
Days X Subjects 69 8.415 -122 6.494** 
Days X Conditions 9 3.258 .362 19.276** 
Conditions X Subjects 69 12.259 178 9.361** 
Conditions X Subjects X Days 207 3.882 018 
Experiment 1 

B 
Days 3 1.318 439 6.855"* 
Subjects 23 12.868 560 8.730** 
Conditions 5 3 20.254 6.751 105.342** 
Days X Subjects 69 5.746 .083 1.299 
Days X Conditions 9 515 057 839— 
Conditions X Subjects 69 12.029 174 2.720** 
Conditions X Subjects X Days 207 13.267 .064 


nnn Ř—ĖŮĖŮĖ 


** Significant at 1% level, 


Dimensional Analysis of Motion: X. 


tion to Fig. 4. Significant learning effects 
occur in the travel movements under certain 
conditions of manipulation, but the specific 
condition defines the nature of these effects. 


Comparison of the Duration of Different Ma- 
nipulative Movements 


In order to clarify further the main points 
of this study, a brief examination will be 
made of the differences in duration of the 
actual manipulative movements which were 
carried out. Figure 5 summarizes learning 
curves for the eight different manipulative 
movements during the four days of practice. 
Figure 5A gives the data for the first experi- 
ment and 5B for the second study. Again 
the second set of curves has been drawn on 
a different time scale than that used in the 


SECONDS 


DIAL SETTING 


A T~e TURNING LEF T 
~e — ——e PRESSING TOGETHER 


SECONDS 
np 
> 


l 2 3 4 
B 
DAYS 


Fic. 5. Learning curves over the four days of 
practice for the different types of manipulation, The 
top curves are for the first experiment and the bot- 
tom curves for the second. Note the differences in 
the time scale for the two sets of curves. 


225 


first study because of the marked time dif- 
ferences in the data concerning dial setting. 

Examination of the curves of Fig. 5 shows 
that marked learning effects are found for all 
types of manipulation used. The types show- 
ing the least learning in the situation are the 
two turning movements and the pressing ma- 
nipulation. The sharpest learning effects are 
observed for dial setting, pressing up a toggle 
switch, and the pull manipulation. 

Some degree of interaction of the different 
learning functions is graphically portrayed in 
Fig. 5, but the instances of such effects are 
limited. In general, the original relative dif- 
ferences between the different forms of ma- 
nipulation are maintained throughout the 
learning period. It is to be noted that the 
perceptually loaded manipulation of dial set- 
ting gives the longest manipulation times, and 
the pull type of manipulations the shortest 
time. 

The differences portrayed in Fig. 5A have 
been examined in terms of analysis of vari- 
ance in order to give an illustrative appraisal 
of the data obtained in this phase of the ex- 
periment. Conditions of manipulation were 
found to be significant at the .01 level of 
confidence. The variable days, which is the 
basis of learning effects in this study, also is 
significant at the .01 level. Accordingly, both 
the types of manipulation and learning effects 
are significant in this particular experiment. 


Discussion 


High-precision electronic methods of mo- 
tion analysis have been developed especially 
for the experimental investigation of the sys- 
tematic problems of industrial time-and-mo- 
tion study. These methods utilize special 
vacuum-tube switching circuits and electronic 
interval timers that permit timing of the dura- 
tion of different component movements at or 
beyond a precision of 0.001 seconds, In this 
study these methods have been used to de- 
termine whether or not a travel movement of 
the arm of fixed length (24 inches) changes 
in duration when it is related to different con- 
ditions of manipulation. Eight different types 
of manipulation were used in the study. 

The main results of the experiments prove 
that there is marked interaction between the 


226 


travel and manipulative parts of a skilled 
task. In skilled performance a travel move- 
ment of fixed length can be changed in dura- 
tion as much as 52 per cent with different 
types of manipulation. 

The interaction between travel and ma- 
nipulation components of motion are found 
also during learning. The extent to which 
practice affects a travel movement of fixed 
length depends on the type of manipulative 
movement with which it is associated. This 
experiment proves that the presence or ab- 
sence of learning in a given travel movement 
may be determined entirely in terms of the 
interaction of this movement with manipula- 
tive reactions. 

In general, types of manipulation that are 
perceptually loaded, such as dial setting, or 
require exact positioning of the hand, such as 
a push-button manipulation, are associated 
with travel movements of relatively long dura- 
tion. Travel movements in these same con- 
ditions of manipulation also show the most 
marked changes during learning. 

Measurements of the duration of eight dif- 
ferent types of manipulation used in the 
study indicate that significant differences oc- 
cur in the duration of these movements. Of 
the types of manipulation studied, dial-set- 
ting manipulation gives the longest time, and 
pulling a small latch gives the shortest time. 

Present methods of time study in industry, 
including predetermined time-standard sys- 
tems, assume an independence of the com- 
ponent movements in motion. The presence 
of interaction between the component move- 
ments making up an industrial task acts as 
an error-producing factor in both direct time 
study and in the application of a predeter- 
mined time standard. The error in applica- 
tion of one widely used predetermined stand- 
ard system is said to be of the order of i5 
per cent. In this study we find that a given 
component movement in a task, a travel 
movement, may change in duration as much 
as 52 per cent due to its interaction with dif- 
ferent types of manipulation. 

At this point, it may be worth while to 
mention that the results just described are 
based on data secured from subject-operators 
after they had reached a skilléd@evél of per- 

Suu. SE 


Donald Hecker, Donovan Green, and Karl U. Smith 


formance in the task situation that is not ma- 
terially improved by further practice. Ac- 
cordingly, this research is based on the mo- 
tion analysis of a level of skill that is very 
comparable to sustained work in the indus- 
trial situation. Furthermore, it is our gen- 
eral notion that Ss studied here were well 
motivated and highly cooperative, and that 
their work is equivalent in every major way 
to the industrial worker. Rather than re- 
ceiving money for their work, Ss in this study 
received point credits toward their final grades 
in courses in psychology. 

It is also our purpose to point out the com- 
pleteness of the motion analysis carried out 
here. If the motions investigated in this re- 
search had been measured by methods of 
micromotion analysis, using film speeds at 
100 frames per second to time the component 
movements, approximately 1,200,000 feet of 
film would have been necessary to conduct 
the work. The electronic methods of motion 
analysis permit relatively comprehensive ex- 
amination of the problems of time study in 
industry. 

In prior studies (2, 6, 8) it has been shown 
that the learning affects differently the travel 
and manipulation movements in motion. Ob- 
servations made in this study prove that the 
degree of learning which will occur in the 
travel and manipulative parts of a task de- 
pend not only upon the type of movement 
itself, but also upon the interaction of a 
movement with other component parts of the 
task. The rate of learning a travel move- 
ment of fixed length changes in relation to 
the types of manipulation with which this 
movement is associated. 

The results just noted are not entirely nega- 
tive for the industrial applications of motion 
analysis. The basic problem of motion study 
in industry is a scientific one involving de- 
tailed understanding of the properties and 
causation of movements used in work. This 
research provides accurate measures of the 
interaction of manipulative and travel com- 
ponents of movement in common tasks which 
may be used in handling both practical and 
theoretical problems of motion study. Sim- 
plified concepts of elemental movements in 
motion are not an adequate theoretical foun- 
dation for industrial time-and-motion study. 


Dimensional Analysis of Motion: X. 


The interaction of the separate component 
movements in motion is a fundamental prob- 
lem not only in industry but also in general 
experimental psychology. Inadequate meth- 
ods of motion analysis have limited the study 
of this problem heretofore. 

It is a common assumption that the process 
of learning is the decisive factor in the de- 
termination of the integration of movements. 
This experiment proves otherwise. The find- 
ings of this study are that the perceptual and 
reactive make-up of one component move- 
ment in a task defines the role of learning 
itself on all parts of the task. The nature of 
one part of a motion, e.g., the manipulation 
component, will not only determine the ex- 
tent to which learning affects this movement, 
but also defines the role of learning in chang- 
ing other component movements in the task. 

The advances in methods of this experi- 
ment are perhaps more important than the 
specific results reported on movement inter- 
action. The electronic methods of motion 
analysis developed here make possible the 
broad experimental study of the integration 
and organization of movements in psycho- 
motor skill in relation to learning, motiva- 
tion, emotion, growth, and other general as- 
pects of behavior. 


Summary 


High-precision electronic methods of mo- 
tion analysis have been developed and ap- 
plied to a problem of the interaction of the 
component movements in patterned motions. 
The experiment consisted in measuring the 
variation in a travel movement of constant 
length when this movement was performed in 
relation to eight different types of manipula- 
tion. 


Varaa Ednl.”S 


| 4 
\ 


iad 


227 


The results show that the duration of a 
travel movement of fixed length may change 
as much as 50 per cent when it is associated 
with different forms of manipulation. Fur- 
thermore, the degree to which this travel 
movement changes during learning depends 
on the type of manipulative movement with 
which it is related. 

The results are discussed in relation to in- 
dustrial time-and-motion study and in terms 
of their bearing on the general problem of 
integration of the component movements in 
motion. 


Received August 18, 1955. 


References 


1. Cohen, L., & Strauss, L. Time study and the 
fundamental nature of manual skill. J. con- 
sult. Psychol., 1946, 10, 146-153. 

2. Harris, S. J., & Smith, K. U. Dimensional analy- 
sis of motion: VII. Extent and direction of 
manipulation movements as factors in defin- 
ing motions. J. appl. Psychol., 1954, 38, 126- 
130. 

3. Lifson, K. A. Errors in time-study judgments of 
industrial work pace. Psychol. Monogr., 1953, 
67, No. 5 (Whole No. 355). 

4. Rubin, G., & Smith, K. U. Learning and integra- 
tion of movements in a pattern of motion. 
J. exp. Psychol., 1952, 44, 301-305. 

5. Ryan, T. A., & Smith, Patricia C. Principles of 
industrial psychology. New York: Ronald, 
1954. Pp. xiv and 534. 

6. Simon, J. R, & Smader, R. C. Dimensional 
analyses of motion: VIII. The role of visual 
discrimination in motion cycles. J. appl. Psy- ` 
chol., 1955, 39, 5-10. 

7. Smader, R. C., & Smith, K. U. Dimensional 
analyses of motion: VI. The component move- 
ments of assembly motions. J. appl. Psychol., 
1953, 37, 308-314. 

8. Wehrkamp, R., & Smith, K. U. Dimensional 
analyses of motion: II. Travel distance ef- 
fects. J. appl. Psychol., 1952, 36, 201-206. 


v. Research | 
mg COLLEGE | 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Speed and Accuracy of Reading Horizontal, Vertical, 
: and Circular Scales * 


Norah E. Graham 


The Nuffield Department of Industrial Health, University of Durham 
(King’s College, Newcastle upon Tyne) 


A series of experiments has been designed 
to compare the human response to numerical 
information displayed on horizontal, vertical, 
and circular scales. It has already been 
shown (2, 3) that if an operator has to con- 
trol a moving pointer on a scale by turning a 
knob, then speed and accuracy are greatest 
when the horizontal scale is used. This sug- 
gests that clockwise rotation of the control 
knob is naturally associated with pointer 
movement from left to right when the con- 
trol is vertically below the display. The 
principal value of this work lies, therefore, in 
the information which it gives about display- 
control relations: the subjects could have ig- 
nored all the scale markings except the one 
on which the pointer was to be kept. The 
comparison of the three types of display is 
not complete without some measure of the 
speed and accuracy of making scale readings 
and this is the purpose of the experiments 
described here, 


Method 


The subjects (Ss) read the scales from a projected 
cinefilm. Horizontal, vertical, and circular scales, 
identical to those used in the previous experiments 
(Fig. 1) were drawn in white ink on black paper. 
The pointer, which was cut out of aluminum foil 
and painted white, was placed opposite the appro- 
priate number as each scale was photographed. A 
16-mm, cinecamera was used and the timing regu- 
lated by the successive-frame exposure technique. 
The camera was fitted with an accurate frame coun- 
ter worked from the shutter shaft so that each frame 
was counted as it was exposed. The speed of pro- 
jection was 24 frames per sec., so that, for example, 
a setting which the Ss were to see for $ sec. was ex- 
Posed for 12 frames. Each exposure was followed 
by 8 sec. of black spacing which allowed Ss to write 
down the scale reading. The word “READY” then 


1 Acknowledgments are due to Professor RAG: 
Browne for his advice in this work, and to Mr. H. 
Campbell, B.A., F.S.S., for statistical help; also to 
the Department of Photography, Medical School, 
Ke College, for their cooperation in making the 

Ims. 


appeared on the screen, for 2 sec., to prepare Ss for 
the next scale. 

The projected circular scale was 5.1 in. in diameter 
and the horizontal and vertical scales were 16 in. in 
length. The intervals between scale markings were 
therefore the same on all three displays. The scales 
were viewed from a distance of 40 in, and appeared 
approximately at eye level. The angle subtended at. 
the eye by the image of a scale on the screen was 
comparable to that subtended by the displays in 
the tracking experiments. 

The film started with one example of each scale, 
which remained on the screen for 10 sec.; the cor- 
rect reading appearing alongside the scale after the 
first 5 sec. This was followed by 9 practice read- 
ings, three on each scale, and then by the test itself. 
In both practice and test, the exposure time was 
+ sec, this value having been chosen as the result 
of a pilot experiment. 

When choosing the test numbers, the scales were 
considered as being made up of five major segments 
—0-2, 2-4, 4-6, 6-8, and 8-10—and on each scale 
two readings were chosen in each segment. The 
subdivisions within the major segments were divided 
into two groups: 

1. .1, 4, .6, and .9, all of which are next to an 
extra long graduation mark, and, 

2. .2, 3, .7, and .8, all of which are two subdivi- 
sions away from such a well-defined scale marking. 

On each scale five readings were chosen in the first 
of these two groups and five in the second. Thus, 
with only three scales, five major segments and two 
types of subdivisions to be considered, it was only 
necessary for each subject to make 3 X 5 X 2 =30 
readings in order for a complete analysis of the re- 
sults to be possible. 

Sixty male university students, all studying some 
branch of engineering, acted as Ss. 


Results 


The Ss’ responses were scored as follows: 

Correct readings scored 0. 

Readings in error by +0.1 scale units 
scored 1. 

All other errors and omissions scored 2. 

The resulting distribution of scores was ap- 
proximately normal. Marked improvement in 
performance occurred during the practice ex- 
posures, but the scores obtained during the 


228 


sy 


Speed and Accuracy in Scale Reading 


OL ER 
° I | 


Fic. 1. 


experiment proper show no systematic im- 
provement. 

The error score for each segment of the 
three scales is shown in Table 1. The high 
incidence of mistakes at the ends of the 
scales is very noticeable. This is to be ex- 
pected on the linear scales as it may take 
longer to find the pointer in these positions, 
but it is surprising to find a similar trend on 
the circular scale. 

In an analysis of variance (Table 2) the 


229 


SORE ER EE) Re 


3 5 Ti 8 9 10 


Peep ppngeny 


Horizontal, vertical, and circular scales. 


three variables, subjects, scales, and units, 
and their first- and second-order interactions 
were considered. 

The first order interaction between scales 
and units is significant (P<.001). This 
means that the position on the scale in which 
the pointer lies has more effect on the ac- 
curacy of reading on one type of scale than 
on another, and has been shown to be due to 


Table 2 


An Analysis of Variance of the Errors in 
Scale Reading 


Table 1 
The Total Error Score for Each Segment x 
of the Three Displays Source df Variance F P 
Between Ss 59 1.185 3.055 <.001 
Scale Between scales 2 8.521 3.614 <.05 
- Between units 4 9.605 4.073 <.01 
Major Hori- Ver- Cir- Total 
Segment zontal tical cular Error Interactions 
0-2 80 72 78 230 Ss and scales 118 0.4058 105 N.S. 
2-4 33 58 38 129 Ss and units 236 0.3482 0.90 N.S. 
46 44 48 40 132 Scales and units 8 2.358 6.08 <.001 
6-8 18 64 41 123 Ss,scales,andunits 472 0.3973 1.02 N.S: 
8-10 53 123 64 240 Residue 900 0.3877 
Total error 228 365 261 854 Total 1,799 


230 


the very high error at the top of the vertical 
scale. Many more mistakes were made be- 
tween 8 and 10 on this scale than in any 
other region of the three displays. 

When compared with this significant inter- 
action, the variance due to the shape of the 
scale is found to be significant at the .05 
level of confidence. It was shown by means 
of the ¢ test that the errors are significantly 
greater on the vertical scale than on the hori- 
zontal or the circular scale, but the difference 
between the latter two may be attributed to 
chance. 

Another significant variable is the unit or 
section of the scale in which the pointer lies. 
In this case the ¢ test shows that the liability 
to make mistakes is significantly greater at 
the ends of the scales in sections 0-2 and 8— 
10 than in the three middle sections, 2-4, 4— 
6, and 6-8. 

A more detailed analysis of the results 
showed that the position of the subdivision 
within the major segment (i.e., the tenths) 
had no significant effect on the accuracy of 
reading. The total error score for the group 
of readings ending in .1, .4, .6, or .9 was 457, 
while the total score for those ending in .2, 
3, .7, or .8 was 397. 

When compared with the residual variance 
the differences between Ss are highly signifi- 
cant. The best § read 29 out of the 30 


Norah E. Graham 


scales correctly, while the poorest made 21 
mistakes. 

Table 3 shows the frequency with which 
different types of error occurred on the three 
scales. The number of correct readings was 
greatest on the horizontal scale, and even if 
the readers had been allowed a margin of 
error of + 0.1 scale units, this display would 
still have ranked first in order of accuracy. 
Readings on the circular scale, on the other 
hand, were nearly always correct to within 
0.2 scale units and only one reading on this 
display was missed altogether. 

When the direction of the errors is taken 
into account it is seen that there is a tend- 
ency to overestimate a reading by 0.1 or 0.2 
scale units on the circular scale. This was 
particularly true of the four readings 0.2, 
8.6, 1.4, and 4.6. For example, 11 Ss read 
1.4 as 1.6 and 13 read it as 1.5. Only four 
Ss underestimated and called it 1.3. Or 
again, 8.6 was read as 8.8 by nine Ss, and as ` 
8.7 by 17 Ss, whereas only two mistook it for 
8.5. This accounts for the high error score 
at the extremities of the circular scale, par- 
ticularly between 0 and 2, though it does 
not explain it. Such a tendency to overesti- 
mate is not peculiar to the circular scale, how- 
ever. On the vertical scale errors of + 0.1 
occur much more frequently than those of 
=O 


Table 3 
The Frequency with Which Errors of Different Magnitude Were Made on Each Scale 


Horizontal Vertical Circular 
Number % Number VA Number % 

Correct readings 413 69.0 324 54.0 390 65.0 
Errors 

+1.0 4 3 2 

Las 3 1.0 A 1.2 a 03 

+0.2 6 8 34 

is 5 20 $ 2.6 3 6.0 

+01 66 110 102 

Pat ai es) pe) B42 m 26.7 
Other errors 11 23 11 
Missed readings 12 sxi 43 ah 1 20 
Total 600 100.0 600 100.0 600 100.0 


Speed and Accuracy in Scale Reading 


Discussion 


The gross errors of + 1.0 scale unit which 
occurred in the present experiment were all 
associated with readings in the second half of 
a numbered division. Kappauf (4) remarks 
that under these conditions the scale number 
read is apt to be that nearest to the pointer. 
The tendency noted by the same author to 
“round out” readings, particularly in the first 
numbered interval of scales which start at 
zero, is not apparent in the present experi- 
ment, presumably because of instructions to 
record the zero in such cases; it may, how- 
ever, occur in practical situations. Vernon 
(6) considers that gross mistakes are also 
liable to occur near the zero on circular scales, 
but the present results confirm the finding of 
Sleight (5) that gross errors at the ends of 
a scale are less frequent on scales without a 
clearly defined break. 

The mistakes which do happen at the ends 
of the circular scale are principally local, that 
is to say, of less than one numbered scale 
division. Local errors in any part of the 
scale display a tendency to overestimation. 
This was also noted by Sleight and seems to 
have no obvious explanation. 

Sleight attributes the differences between 
the scales used in his experiment to the varia- 
tion in their “effective” area; the larger the 
area to be scanned the less accurate the read- 
ing. Such an explanation does not account, 
however, for the difference between the hori- 
zontal and vertical scales which he also found 
to be significant and which the present work 
Suggests is the more important difference. 
From a physiological point of view, an ex- 
planation can be based on the shape of the 
visual field and the mechanics of eye move- 
ments. Objects that subtend an angle of 
more than 4° at the eye can be detected if 
they lie within a field whose boundaries are 
approximately 100° to the right or left of the 
Point of fixation, 70° above it and 80° be- 
low it. The width of the visual field is thus 
Considerably greater than its height, which 
is one factor that might favor the reading of 
horizontal scales. This is simply another 
way of saying that the eyes are set in the 
head in a horizontal line. The linear displays 
as they appeared in this experiment sub- 


231. 


tended an angle of approximately 10° at the 
eye. No diffculty should have been experi- 
enced, therefore, in finding the pointer even 
at the top of the vertical scale. The region 
of foveal vision, however, only subtends an 
angle of about 3° at the eye and, in order to 
read the scale, it is necessary to focus on the 
pointer itself. During very short exposures 
the accuracy of reading therefore depends 
upon the speed with which eye movements 
can be made. Scanning along a horizontal 
line is a relatively simple action involving 
the use of the lateral and medial recti muscles 
only. Raising or lowering the eyes, on the 
other hand, involves the joint action of the 
superior and inferior recti and the inferior 
and superior obliques. According to Duke- 
Elder (1) it has been shown by photographic 
studies that the eyes can follow lines in the 
horizontal plane more easily than in any 
other. It has been found, moreover, that 
horizontal eye movements are the most rapid 
and vertical ones the slowest. When the fact 
that people are accustomed, when reading, to 
scanning along a horizontal line is added to 
this evidence, it is not difficult to explain the » 
superiority of the horizontal scale. 


Summary 


1. The speed and accuracy of reading com- 
parable horizontal, vertical, and circular scales 
has been studied by means of a film. Pic- 
tures of the scales were flashed on a screen 
at 10-sec. intervals, the exposure time being 
$ sec. 

2. The vertical scale is clearly less easy to 
read than either of the other two displays, 
particular difficulty being experienced near 
its ends. 

3. The success of the circular scale may be 
attributed to the fact that it presents a 
smaller area to be scanned. The shape of 
the visual field and the relative ease of mov- 
ing the eyes from side to side, rather than 
up and down, are thought to account for the 
greater accuracy on the horizontal scale. 


Received July 5, 1955. 


References 


1. Duke-Elder, W. S. Textbook of ophthalmology. 
London: Kimpton, 1932. 


232 Norah E. Graham 


‘2, Graham, N. E., Baxter, I. G, & Browne, R. C. 4. Kappauf, W. E. A discussion of scale-reading 


Manual tracking in response to the display habits. USAF, WADC Tech. Rep., 1951, No. 

of horizontal, vertical and circular scales. 6569. 

Brit. J. Psychol. (Gen. Sec.), 1951, 42, 155- 5. Sleight, R. B. The effect of instrument dial shape 

163. on legibility. J. appl. Psychol., 1948, 32, 170- 
3. Graham, N. E. Manual tracking on a horizontal 188. 

scale and in the four quadrants of a circular 6. Vernon, M. D. Scale and dial reading. Flying 

scale, Brit. J. Psychol. (Gen. Sec.), 1952, Personnel Res. Committee Rep., 1946, No. 


43, 70-77. 668. 


_ 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Evaluation of a Display Incorporating Quantitative and 
Check-Reading Characteristics 


Martin I. Kurke 


U. S. Army Ordnance Human Engineering Laboratory, Aberdeen Proving Ground 


The present paper describes an evaluation 
of the validity of a new principle underlying 
dial design. It is demonstrated that a hu- 
man operator can check-read the proposed 
quantitative dial with significantly fewer 
errors and with greater speed than the con- 
ventional quantitative dial faces now in use. 
The dial design, described elsewhere in detail 
(4), is based upon the principle that a moni- 
tor can visually perceive and interpret sim- 
ple high-contrast figures of known symbolism 


Descriptive drawing of experimental 
dial design. 


Disc A less angle a and with pointer p is mounted 
on disc B. Upon the latter is painted circle C less 
angle c. Under “Safe and Normal” conditions (II) 
angles a and c are aligned, but under “Red-line” 
conditions (III) the painted area C is exposed pre- 
senting a contrasting colored wedge (see 4). 


Fic. 1. 


more readily than he can perceive and re- 
act to such a complex process as reading a 
scale (2, 5). In essence, the dial face is so 
designed that when the indicator is pointing 
to a portion of the scale indicating caution or 
danger functions of the machine system, there 
appears on the face of the dial a high-contrast 
wedge which is not present when the ma- 
chine is operating within “safe and normal” 
limits (Fig. 1). This wedge insures that 


changes in shape, area, hue, and brillance at- 
tract the eye of the monitor. 

The engineering validity of the dial prin- 
ciple was previously investigated and post- 
flight comments were obtained from test 
pilots who flew a Bell Model 47-G helicopter 
in which a prototype dial face was installed 
within the airspeed indicator for a period of 
one month. During this time the aircraft 
was used for other flight demonstrations. In- 
formal interviews and flight test reports (1) 
elicited generally favorable comments from 
the test pilots, who indicated their belief that 
the dial design would be more efficacious in 
a panel consisting of a large number of cru- 
cial dials requiring only an occasional glance 
for monitoring than for a helicopter airspeed 
indicator. 

Method 


Four decks of 50 index cards were prepared, three 
of which had dial faces drawn on them (Fig. 2). 
All dials were two inches in diameter and were num- 
bered clockwise from zero to ten along the pe- 
riphery at 224-degree intervals with zero at the top. 
The indicator was drawn to point to the integers 
from 0 through 10, and to 3, 14, 84, and 93, On 
thirty of the cards in decks A, B, and C, the indi- 
cator pointed to a number between 14 to 84 inclu- 
sive. These represented “safe and normal” opera- 
tion. The indicator on the remaining twenty cards 
pointed to numbers in the 0-1 and 9-10 intervals. 
These represented “red-line” operation, In control 
deck A no further indications were made on the 
dials. In control deck B, the red-lined areas were 
marked off with a red edging along the dial periphery 
between 0-1 and between 9-10. Otherwise this deck 
was identical with A. Experimental deck C was 
drawn so that when the pointer indicated “safe and 
normal” only the pointer and number showed as in 
deck A. When the indicator pointed to “red-line” 
operation, however, a red wedge appeared on the 
dial face. The size of this wedge increased in size 
as a function of deviation from “safe and normal” 
conditions. Deck D consisted of 50 consecutively 
numbered cards. Twenty of these numbers were 
randomly chosen and drawn in red at the top of 
the card; the balance were in black. In the center 
of each card was a two-inch circle, 20 of which 


233 


234 


"Safe & Normal” 


"Red-Line" 


Fic. 2. The appearance of decks A, B, and C show- 
ing “Safe and Normal” and “Red-line” displays. 


were chosen at random and filled in so that they 
appeared as black spots. The remaining 30 were 
left as circles. 

Of the 33 male Ss used, 9 were Scientific and Pro- 
fessional rated enlisted men (engineers, physicists, 
and mathematicians) assigned to the Army’s Bal- 
listic Research Laboratories; 10 were civilian and 
military engineers and psychologists (5 each) em- 
ployed by the Human Engineering Laboratory; and 
14 were engineering employees of Bell Aircraft Cor- 
poration. The raw time data of the latter group 
were lost, necessitating calculation of raw time-score 
data with only 19 Ss, All other data were based 
on an N of 33. 

The Ss were tested individually. Each S was in- 
formed that he would receive a shuffled deck of 
cards face down, which he was to hold in one hand. 
He was then required to turn the cards over one at 
a time and place them in two piles according to a 
separate criterion to be given for each of six card 
sorts. He was informed that accuracy was of prime 
importance and that he could correct any mistake 
provided he had not started to turn over the next 
card in the deck. He was also told that speed was 
almost as important as accuracy and to sort the 
cards as rapidly as he could consistent with accuracy. 
Each S first sorted deck D on the basis of number 
color. The sole purpose of this sorting was to en- 
able Ss to get the “feel” of the cards and practice in 
manipulating them. As on all sortings, time and 


Martin I. Kurke 


error scores were recorded. However, for the first 
sorting these data were not used. On trial 2, deck 
D was again sorted, this time on the basis of 
whether or not the circles were filled in. Then 
every third S (Group I) was shown deck C, and 
the mechanisms of the dial it represents was ex- 
plained. He then sorted “safe and normal” from 
“red-line” displays. The S then received instruc- 
tions on the dial system for deck A, which he sorted 
on the same basis, followed by a similar procedure 
for deck B for his fifth sorting. Groups II and III 
performed similarly, except that their sequences for 
decks A, B, and C were counterbalanced. For the 
final card-sort, deck D was again sorted on the basis 
of difference in circles. 

In addition to practice in manipulation of the 
cards, sorts 2 and 6 had another purpose. It may 
readily be seen that the card-sorting technique meas- 
ures two things: the speed of discrimination and the 
motor response time. Some method was needed to 
eliminate the effects of motor activity from the time 
scores. Sorting deck D in trials 2 and 6 enabled Ss 
to make a discrimination taking a negligible time to 
perform. We might make the assumption, there- 
fore, that the time to sort on the basis of bright- 
ness discrimination is almost completely the motor 
response time. However, motor time changes with 
practice. Therefore, the mean time of trials 2 and 
6 would be the most reliable estimate of motor time 
on trial 4. Since decks A, B, and C were sorted on 
trials 3, 4, and 5 in equal numbers, by subtracting 
each S’s mean time of 2 and 6 from his scores of 3, 
4, and 5 the most reliable estimate of the time taken 
to make the discriminations in decks A, B, and C 
was yielded. The latter scores will be referred to 
as “adjusted time scores.” 


Results 


Errors. It was felt that the differences in 
errors were so great that a statistical analysis 
would be superfluous. Thirty-three Ss, each 
making 50 discriminations on deck C, made 
a total of only one error out of 1,650 trials. 
Using the conventional “red-line” dial (deck 
B) the same Ss made 18 errors, while dials 
without any warning system yielded 39 errors 
(Fig. 3). 

Speed. Mean raw score data for sorting 
50 cards with no warning indicator (deck A) 
was 73.1 sec. The conventional “red-line” 
dial yielded a mean score of 69.6 sec. and the 
experimental display yielded a mean score of 
52.9 sec. The three measures had standard 
deviations of 13.0, 14.9, and 10.3 seconds, 
respectively. 

Mean adjusted time scores for the displays 


Evaluation of Dial Design 


RAW TIME. 
80: Nels 
70: 
60: 
50 
f 
ADJUSTED 
a a0 AS N33 
a] ` 
33 30 
TŠ 
8 
i 20- 
Fa: 
ge 10: 
i 
i 
! 


° 


Dial A 


Diol B Diol © 


Fic. 3. Time and error as a function of dial. 


were 27.8 (o= 10.0), 20.5 (s= 9.8), and 
4.3 (c = 4.9) seconds, respectively, for dials 
A, B, and C. 

The loss of a portion of the raw data pre- 
cluded the reporting of anything but Stu- 
dent’s ¢ test in determining the differences 
between means of raw time scores: t4p = 0.76 
(chance); tro = 3.91 (p< 0.01); and tac 
= 5.17 (p< 0,001). The differences be- 
tween untreated and conventional red-lined 
dials, the latter and the experimental wedge, 
and the untreated and the experimental dials 
in terms of adjusted speed scores yielded: tan 
=2.92 (p < 0.01); tno = 8.35 (p < 0.001); 
and t4¢ = 10.61 (p< 0.001), respectively. 
Owing to the above-mentioned loss of data, 
although adjusted scores have been calcu- 
lated on the basis of 32 df, the raw scores 
were figured on the basis of 18 df. 


Discussion 


An unexpected result of this study is that 
the two control groups differed so little on the 
basis of raw time scores. Although no objec- 
tive evidence to support it seems to be avail- 
able, the widespread practice of red-lining 
dials as in deck B to indicate an abnormal 
state in a machine system attests to an al- 
most universal acceptance of the red-line as 
an aid to check-reading of dials. It was sur- 


235 


prising, therefore, to learn from the data that 
although red lining halves the error score, in 
terms of raw time scores any advantage in 
speed of reading obtained by use of the con- 
ventionally red-lined dial over the untreated 
display is due to chance alone. These re- 
sults, of course, apply specifically to dials 
read for the purpose of card-sorting. If, how- 
ever, the motor aspects of the perceptual- 
motor task are removed, the adjusted scores 
indicate that red lining does provide a sig- 
nificant reduction in reading speed at the .01 
level of confidence. 

Experimental dial C proved to be superior 
to the untreated control at the .001 level of 
confidence, and to the red-lined dial at the 
same level when compared on the basis of 
adjusted scores, but only at the .01 level if 
raw scores are considered. A possible ex- 
planation for the apparent superiority of the 
experimental dial lies in the fact that no 
pointer reading is necessary in order to check- 
read the display. Only the simplest of dis- 
criminations is required. This is in accord- 
ance with the fact that a good display is 
easily read and reduces complexity; critical 
displays are very visible, and changed indi- 
cations are easily detectable (6). The inher- 
ent features of the display also agree with 
the principle that “the instrument shall be 
designed in such a way that the reader will 
not have to remember special rules about 
them in order to read without error” (3). 
Presumably remembering numerical limits 
falls within the category of “special rules.” 


Summary and Conclusions 


By use of a card-sorting experiment, a com- 
parison of three dial designs was made from 
the standpoint of accuracy and the speed of 
check-reading. Within the limits of this ex- 
periment, it was demonstrated that the con- 
ventional method of red lining a dial to in- 
dicate a deviation from “safe and normal” 
operation is significantly better than no “red- 
line” indication at all provided the criteria are 
errors, or reading time isolated from associ- 
ated motor activity. It was also demon- 
strated that the experimental dial design prin- 
ciple is significantly more efficient than the 


236 Martin I. Kurke 


other two, regardless of the three measures 
used in comparison. It is suggested that the 
experimental dial design is more easily read 
due to the fact that a simpler form of visual 
discrimination is required than for the task 
of reading the other dials. 


Received September 23, 1955. 


References 


1. Cannon, J. A. In-flight evaluation of “A quali- 
tative instrument face: CDS-16-2-54.” Buf- 
falo, N. Y.: Bell Aircraft Corp. Memo ENG: 
12:4:0914-1:JAC, 1954. 


2. Chapanis, A., Garner, W. R. & Morgan, C. T. 
Applied experimental psychology. New York: 
Wiley, 1949. 

3. Kappauf, W. E. A discussion of scale reading 
habits. USAF, WADC, Tech. Rep., 1951, No. 
6569. 

4. Kurke, M. I. A qualitative instrument face. 
Aero Digest, 1955, 70, 24. 

5. Reed, J. B. The speed and accuracy of discrimi- 
nating differences in hue, brillance, area, and 
shape. Port Washington, L. I, N. Y.,, U.S. 
Navy, Special Devices Center, 1951. (Tech. 
Rep. 131-1-2.) 

6. Senders, V. L., & Cohen, J. The display charac- 
teristics of a good instrument. Abstr. Air- 
borne Electronics Conf. (Inst. Radio Engnrs.), 
1953, 27-29. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Comprehension by Reading versus Hearing 


Wilse B. Webb and Edward J. Wallon 
U. S. Naval School of Aviation Medicine 


Can a person best comprehend a body of 
information by reading or by hearing? This 
turns out to be a difficult question to answer. 
As pointed out in a recent review, the answer 
is influenced by at least five factors: the type 
of material, the method of presentation, the 
comprehension measures employed, the per- 
ceivers used, and the surrounding conditions 
(2). In a given situation one is most fre- 
quently forced to make an “educated” guess. 

We have recently collected a considerable 
amount of comparative data on this problem. 
Our findings are presented to improve our 
“guesses” in situations similar to those ap- 
proximated by our experimental design. Our 
particular conditions were as follows: (a) 
The material was unfamiliar story-form ma- 
terial containing considerable detail; (b) the 
methods of presentation used were by tape 
recording; self-paced, single “read through” 
of printed material; study of printed mate- 
rial; and simultaneous reading-listening; (c) 
comprehension was measured by a true-false 
examination both immediately and for a 24- 
to 48-hour recall; (d) the subjects were 
highly selected college-level males and the 
material was given under standard testing 
conditions. 


Procedure 
Subjects 


The subjects were young, healthy males between 
the ages of 18 and 25. All had a minimum of two 
years’ college training or the equivalent. All had 
been screened as acceptable for naval flight training 
by a general intelligence type test, a biographical in- 
ventory, a mechanical comprehension test, and a 
spatial apperception test. The subjects were tested 
in groups of approximately 30 subjects each. 


Test Material 


It was desirable to use material which had low fa- 
miliarity for the subjects. To this end stories of 
Greek mythology taken from Bulfinch’s M: ‘ythology 


1 Opinions and conclusions contained in this re- 
port are those of the authors. They are not to be 
construed as necessarily reflecting the view or the 
endorsement of the Navy Department. 


were used (1). Six stories were chosen: Pyramus 
and Thisbe (947 words), Juno and Io (884 words), 
Diana and Acteon (770 words), Nisus and Scylle 
(793 words), Callisto (514 words), and Cephalus 
and Procris (741 words). In most cases the stories 
were taken intact; in several a number of passages 
were deleted without affecting the story continuity. 

Preliminary analyses indicated that a minimum of 
three stories per test was required for adequate reli- 
ability, enough material on which to construct ques- 
tions, and of sufficient length to achieve adequate 
difficulty of items. 

To meet these needs, two forms of the tests were 
developed. Form I consisted of three stories: Py- 
ramus and Thisbe, Juno and Io, Diana and Acteon; 
Form II was made up of three stories; Nisus and 
Scylla, Callisto, and Cephalus and Procris. The writ- 
ten form of these tests is available from the files of 
the Psychology Laboratory of the Naval School of 
Aviation Medicine. 

The two sets of stories were used so that com-- 
parisons of differences between methods of adminis- 


Table 1 
Familiarity with Story Material Prior to Testing 


No. of Stories 


Familiar to Form I Form II 
Subject (N=73) (N=37) 

0 59 34 

1 10 3 

2 2 0 

3 2 0 


tration within subjects could be analyzed. These re- 
sults are to be reported elsewhere. 

That the material was essentially unfamiliar is at- 
tested by the figures given in Table 1 which presents 
the number of stories in each form with which two 
representative samples of the population were fa- 
miliar. 


Measures of Comprehension 


For each story 16 true-false questions were con- 
structed. On the basis of an item analysis of the 
first several presentations of each form several items 
were eliminated from each form as either not dis- 
criminating between the upper and lower fourths of 
the distributions or as having unfortunate pass-fail 
proportions. The resultant number of items for each 
story for each form were: 


237 


238 
Form I Form II 
Pyramusand Thisbe 15 Nisus and Scylla 14 
Juno and Io 15 Callisto 16 
Diana and Acteon 15 Cephalus and Procris 13 
45 43 


Method of Presentation 


Auditory. The two forms of the tests were tape 
recorded. The following set of instructions was first 
recorded: 

“In this recording you will be presented with a 
group of three stories. Listen to each carefully. 
After the stories have been concluded, they will be 
followed by a series of statements. You are to in- 
dicate whether each statement is true or false by 
filling in between the dotted lines of your IBM an- 
swer sheet under the small number “1” if the state- 
ment is true, or by filling in between the dotted lines 
under the small number “2” if the statement is 
false.” 

The test (Form I or Form II) was then presented. 
Following the reading of the three stories constitut- 
ing a given form, the questions were then given on 
the tape. Answers were marked on IBM answer 
sheets, 

Read through. The two forms of the tests were 
mimeographed. The following set of instructions 
was given verbally: 

“You have each been given three sets of reading 
material. You will be asked to read these materials 
following which you will be given a brief test con- 
taining true and false statements which you are to 
answer. t 

“When the signal is given, you are to turn over 
your papers and begin reading. It is not necessary 
to read fast or hurry. This is not a test of reading 
speed. You are to read at your usual speed. Read 
through each of the stories once, and only once. 
Do not reread or review any sentences. Read each 
statement just once and go on to the next sentence 
without pausing between sentences to memorize or 
study the material you have gone over. Do not re- 
turn to any parts of the material you have already 
read.” 

The subjects were then given a set of mimeo- 
graphed questions with IBM answer sheets. Again, 
they were told to read the questions only once and 
not to delay on any question. They were to answer 
on the IBM answer sheets. 

Read-study. The subjects were given the mimeo- 
graphed material and told to study the material. 
They were permitted to study the material for a pe- 
riod equivalent to the time required to present the 
material auditorily; this was seventeen minutes. 
The stories were then taken up and the subjects 
were given the mimeographed questions and IBM 
answer sheets. They were then instructed to answer 
the questions. Again they were given the time re- 

quired to present the material verbally. 


Wilse B. Webb and Edward J. Wallon 


Auditory-read. The subjects were given the mimeo- 
graphed stories, told that these stories had been re- 
corded, and instructed to listen to the stories and 
use the written material as they so desired. They 
were informed that an objective exam would be 
given on the material of the stories and that the 
written material would be taken up at the end of 
the recording. At the end of the recording the 
stories were taken up and written questions and 
IBM answer sheets were passed out. The recorded 
questions were played and the subjects answered on 
IBM answer sheets. The written questions were 
taken up along with the answer sheets at the end 
of the recording. 

The procedure for presenting the stories varied 
among the different groups. Some groups received 
only one type of administration. Several groups re- 
ceived the two forms of the test under the same 
type of administration. Still other groups received 
the tests under two different kinds of administra- 
tion. These tests were always given at least 24 
hours apart. No practice effect was found under 
these conditions so each test has been considered in- 
dependently. 

In all of these presentations the subjects were 
seated in a group testing room and the tests were 
administered along with other tests as a part of the 
routine procedure of entering the Training Com- 
mand. 


Results 


Reliability of Measures 


The forms and methods of administration 
yielded the split-half reliabilities estimates 
given in Table 2, These are odd-even cor- 
relations corrected by the Spearman-Brown 
formula, 


Comparison of Methods of Presentation 


The means, sigmas, and number of sub- 
jects for the methods of presentation by 
forms are given in Table 3. 

Separate analyses of variance by methods 
were completed for the two forms of the com- 


Table 2 
Split-half Reliabilities Among Methods 
of Administration 
Method N FormI N FormIl 
Auditory 118 = .76 113 64 
Read-through 182 72 122 62 
Read-study 46.72 94 65 
Auditory-read 36 46 36. 34 


| SNES OE) "gee E ED oe — — T- 


Comprehension by Reading versus Hearing 239 
Table 3 Table 5 
Means, Sigmas, and Ws for Methods Table of ¢ Ratios Obtained Between the Various 
of Presentation Methods of Presentation 
Form I Form IT Read- Auditory- 
Study Auditory Read 
Method N Mean SD N Mean SD 
Auditory 118 35.60 5.12 113 30.09 3.69 Form I 
Read-through 182 34.55 5.02 122 30.15 4,00 Read-through 3.07* 1,81 3,09* 
Read-study 46 37.04 4.94 94 3448 3.96 Read-study 1.67 26 
Auditory-read 36 37.33 3.21 36 33.36 2.57 Auditory 1,84 
Form IT 
prehension measures. Significant F ratios be- mene trouens Beet waa ae: 
* 5 ead-study i 
tween methods were obtained in each analy- ‘Auditory 4.43" 


sis. These results are given in Table 4. 
Using the error variance obtained in analy- 
ses for the separate forms, £ tests between the 
various methods were made. These results 
are presented in Table 5. 
The results of the statistical analyses may 
be summarized as follows: 


1. The material was equally effectively ac- 
quired by one hearing of the material or by 
one read-through of the material. 

2. Study of the material for a period of 
time equal to that required for one verbal 
presentation is more effective than either the 
verbal presentation or a single read-through. 

3. Simultaneous reading and hearing of the 
material was more effective than either a 
single reading or hearing but no more effec- 


Table 4 
Analysis of Variance by Methods 


Source of Sum of 


Variation Squares df Variance F 

Form I 

Between methods 394.7 3 1316 5.41* 

Within methods 9189.3 378 24.3 

Total 9584.0 381 
Form II 

Between methods 1402.7 3 467.6 32.47* 

Within methods 5197.0 361 144 

Total 6599.7 364 


* 3.84 significant at .01, 


* Significant at the .01 level. 
** Significant at the .001 level. 


tive than studying the material for an equal 
period of time. 
Conclusions ` 

It was noted in the introduction that broad 
generalizations regarding reading versus hear- 
ing are hard to come by. However, one can 
generalize within a given class of conditions. 
Let us recall the conditions of the experiment: 


1. The subjects were college level and 
screened on intelligence (the average ACE is 
approximately 124.64). 

2. The material was in story form with 
considerable detail containing approximately 
2,300 words. 

3. The method of measurement was ex- 
tensive true-false questions covering both de- 
tail and general aspects. 

Under the conditions of the experiment the 
results are quite clear with the finding of the 
two test forms yielding complementary find- 
ings. These results were: 

1. A single read-through of the material 
and hearing the material read once resulted 
in equally effective comprehension. 

2. Studying (reading) the material for a 
period of time equal to the length of time re- 
quired for verbally presenting the material 
resulted in significantly greater comprehen- 
sion when compared with a single read- 
through or auditory presentation, 


240 


3. Reading and hearing simultaneously the 
material was more effective than either read- 
ing the material through once or hearing the 
material but not significantly different from 
the results of studying the material. 


In general, then, for the conditions of the 
experiment, since reading is more rapid for 
one-time acquaintance with material, reading 
is the preferred method. If equal time is 
available for reading as available for audi- 


Wilse B. Webb and Edward J. Wallon 


tory presentation, significantly more informa- 
tion may be obtained by reading. 


Received October 3, 1955. 


References 


1. Bulfinch, T. Mythology. New York: J. M. Vent, 
1931. 

2. Henneman, R. H. Vision and audition as sensory 
channels for communication. Quart. J. Speech, 
1952, 38, 161-166. 


“ee 


The Journal of Applied Psycholo 
Vol. 40, No. 4, 1956 7 


Role Perceptions of Successful and 
Unsuccessful Supervisors 


E. E. Ghiselli 


University of California 


and R. Barthol 


Pennsylvania State University 


Ghiselli and Brown have offered a concep- 
tual framework for describing an individual’s 
position in an organization, and his relation- 
ships with others (3). The position and the 
relationships of an individual are described 
in terms of prescribed and perceived roles. 
The prescriptions and perceptions are either 
those of the individual himself or those of 
others in the organization. Following New- 
comb, role prescriptions are thought of as ex- 
pectancies or anticipations of particular types 
of behavior (7). They constitute the formal 
characteristics of the individual’s role either 
as set by himself or by others. Role percep- 
tions, on the other hand, are taken to refer to 
the roles the individual sees himself as actu- 
ally fulfilling or the roles that others see him 
as actually fulfilling. 

One therefore can differentiate four types 
of roles: (a) self-prescribed roles (roles that 
the individual believes he should adopt), (b) 
self-perceived roles (roles that the individual 
sees himself as actually filling), (c) roles pre- 
scribed by others (roles others expect the in- 
dividual to adopt), and (d) roles perceived 
by others (roles others see the individual as 
actually filling). The relationships among 
these roles are of some interest. For example, 
when there is little correspondence among 
roles, difficulties among individuals and groups 
may be anticipated since then individuals will 
not be behaving in expected ways. 

One of these relationships between roles is 
of special importance in the industrial situa- 
tion, that is to say, the relationship between 
self-perceived roles and roles perceived by 
others. This relationship involves the degree 
to which the characteristics an individual be- 
lieves he possesses correspond with the be- 
havior that others believe typify him. There 
is particular significance to this relationship 


when the viewing of the behavior, that is, the 
roles perceived by others, is done by manage- 
ment. This perception is contained either in 
formal statements, e.g., merit or performance 
ratings, or in informal opinions. In either 
event these views will be important deter- 
miners of the kinds of administrative action 
management will take concerning the indi- 
vidual. 

In a sense management is a self-perpetua- 
ting group. By and large it is the sole agent 
in the choice of its members, and it maintains 
or terminates the membership of an indi- 
vidual in it. To be sure, in a hierarchical 
organization, such as an industrial organiza- 
tion, management can be thought of as a se- 
ries of levels. Therefore, it is possible to dif- 
ferentiate ordered groups within management. 
The process of selection and maintenance of 
membership, however, necessarily would be re- 
peated from group to group as an individual 
progresses “up the ladder,” except that the 
top group traditionally has the power to make 
its wishes felt in every group below it. 

One therefore can ask the question, what 
qualities do subordinates see in themselves 
when their behavior is judged by higher man- 
agement to conform with that which manage- 
ment expects of them? In other words, what 
are the self-perceptions of individuals in lower 
management whose behavior higher manage- 
ment perceives as conforming to the stand- 
ards that higher management itself imposes? 

A review of the literature suggests that 
higher management and the workers do not 
agree on the qualities that make a good mid- 
dle management supervisor (4, 6). It fur- 
ther indicates that middle management does 
not necessarily recognize the kind of super- 
vision it is giving to line workers (1). This 
paper limits itself to the viewpoints of the 


241 


242 


line supervisor and his superior and does not 
include the presumably different viewpoint 
of the worker. 


Method and Procedures 


The ways of measuring or describing self-percep- 
tions are many. In recent years adjective check lists 
have seemed to give fruitful results because of the 
ease with which they can be interpreted. Some kind 
of adjective check list, then, was decided upon. 
There are certain problems, however, in the use of 
ordinary adjective check lists. With such lists the 
individual merely accepts or rejects a given adjective. 
Therefore he can reject items that place him in an 
unfavorable light. When there is reason to suspect 
that this tendency will occur among subjects, then 
some other device seems indicated. One way of 
overcoming this difficulty is to use the forced-choice 
method. In this procedure the individual is forced 
to choose between a pair of alternatives that are 
equally desirable or undesirable. By this means re- 
jection of all unfavorable items is avoided. 

One of the present writers developed a forced- 
choice adjective check list which shows considerable 
promise in minimizing the effects of faking (2). 
Hence, this instrument was adopted. It consists of 
64 pairs of adjectives, both members of each pair 
referring to traits approximately equal in social de- 
sirability. Thirty-two of the pairs contain adjec- 
tives referring to desirable traits, and the remaining 
32 contain adjectives referring to undesirable traits. 
For the former, the respondent chooses the alterna- 
tive he believes most describes him, and for the 
latter he chooses the one he believes least describes 
him. 
The inventory was completed by 267 persons, all 
of whom were first-line supervisors. In order to ob- 
tain as wide a sample as possible, cases were drawn 
from seven different organizations distributed geo- 
graphically from the far east to the far west. In- 
cluded were four groups of industrial foremen and 
supervisors numbering, respectively 63, 24, 22, and 
20; and three groups of office supervisors number- 
ing, respectively, 91, 26, and 21. 

The persons in each group were rated by their 
superiors. A different rating scale was used in each 
different organization. The numbers of steps on 
these scales ranged from two to sixty, but in every 
case the ratings dealt directly with the degree to 
which the individual was effective in performing his 
job as a first-line supervisor. In some of the organi- 
zations the ratings were made by a single individual, 
while in others they were the average of the ratings 
of two or three persons, The ratings were accom- 
plished either by the supervisors’ immediate superiors 
or by superiors two levels higher. On the basis of 
these ratings by their management, the supervisors 
within each group were divided into high and low 
subgroups. The attempt was made to divide each 
group into half, but owing to the distributions of 
ratings it was impossible to achieve this exactly. 


E. E. Ghiselli and R. Barthol 


The final two subgroups formed by all cases com- 
prised 157 cases rated high and 110 cases rated low. 


Results 


Using the two groups of high- and low- 
rated supervisors an item analysis of the 
forced-choice inventory was performed. Of 
the 64 paired adjectives, 18 pairs, or a little 
better than one out of four, differentiated be- 
tween the high- and low-rated supervisors at 
the 5% level of significance or better. These 
pairs are given in Table 1. The word on the 
left was selected by the high-rated supervisors 
and the word on the right was selected by 
the low-rated supervisors. With the forced- 
choice technique, any item that discriminates 
between two groups necessarily consists of 
two alternatives, one applying to the first 
group and the other applying to the second 
group. As indicated earlier, on those items 
involving socially desirable traits the respond- 
ent indicates which alternative he believes 
most characterizes him, and on those items 


Table 1 


Items Differentiating High- and 
Low-Rated Supervisors 


Superior Inferior 
Supervisors Supervisors 
See themselves as: See themselves as: 
energetic ambitious 
loyal dependable 
kind jolly 
planful resourceful 
clear-thinking efficient 
enterprising intelligent 
progressive thrifty 
poised ingenious 
steady sociable 
appreciative good-natured 
responsible reliable 
Do not see Do not see 
themselves as: themselves as: 
noisy arrogant 
affected moody 
shallow stingy 
unstable frivolous 
nervous intolerant 
opinionated pessimistic 
self-pitying hard-hearted 


Role Perceptions of Supervisors 


involving socially undesirable traits he indi- 
cates which least characterize him. In Table 
1 these two types of items are grouped sepa- 
rately. 

The adjectives in the first column of Table 
1 give those self-perceptions of superior su- 
pervisors which differentiate them from in- 
ferior supervisors. The adjectives in the 
second column give the reverse picture. At- 
tempts to form a total picture of an indi- 
vidual or a group from a list of checked 
adjectives inevitably brings about a certain 
amount of disagreement. Nevertheless we 
have attempted to form integrated pictures of 
the self-perceptions of the two groups from 
the adjectives checked by them. 

The “good” supervisor sees himself as ac- 
tive, purposeful, and forward looking. He is 
favorably disposed toward his company and 
identifies himself with his job. He views his 
responsibilities broadly, that is, of having a 
job to do rather than a series of assigned 
tasks. He feels that he must exercise certain 
independence of thought and action: plans 
and decisions are an integral part of his work 
and cannot be left solely to his superiors. 
His orientation toward production is through 
people. He sees himself as respecting the 
rights and dignity of others, but is somewhat 
reserved. He considers himself to be stable 
and to display an evenness of temperament. 
He feels that he is worthy of the respect and 
confidence of others and that other people 
can trust him. One gets the over-all impres- 
sion of maturity and calmness. 

The most outstanding self-perception of the 
“poor” supervisor is his sales approach to hu- 
man relations. He sees himself as a good fel- 
low who is well liked but he does not show 
any need to understand and respect others. 
His chief concern seems to be the impression 
he makes on others. He seems to have a nar- 
row approach to his job and sees himself as 
being highly skilled in carrying out instruc- 
tions. He gives no indication of leadership 
qualities, but instead relies on his own in- 
genuity and intelligence to complete a job. 
He tends to be self-oriented rather than com- 
pany-oriented; his efforts are for his own ends 
rather than those of the company. He does 
believe, however, that he possesses the quali- 


243 


ties that management could well use to ad- 
vantage. 

These descriptions are in accord with the 
findings that poor supervisors are more pro- 
duction-oriented than are good supervisors 
(5). It is our interpretation that the poor 
supervisor tends to view production as an end 
in itself and as his personal responsibility. 
The good supervisor tends to view produc- 
tion as a means to an end (over-all company 
success) and that his main responsibility is 
working with the people who are the direct 
producers. 


Discussion 


The generalizations drawn from the ob- 
tained results are based on at least two as- 
sumptions: (a) The self-perceptions shown 
by the supervisors are approximately in ac- 
cord with the perceptions of higher manage- 
ment of these same supervisors, That is to 
say, higher management sees the good super- 
visors as having the same qualities that the 
good supervisors see in themselves. The 
same is true for the poor supervisors. (b) 
The differences in the self-descriptions reflect: 
the qualities that distinguish the good super- 
visors from the poor supervisors. 

With these assumptions in mind we can 
offer some conclusions concerning the role 
prescriptions of higher management for first- 
line supervisors. Higher management ap- 
proves of those supervisors whose attitudes 
seem to be similar to those traditionally held 
by higher management. We referred earlier 
to the hierarchical groups that comprise an 
industrial organization. This study supports 
the notion that the members of the lower 
echelons who are like the members of the 
higher echelons are most likely to win ap- 
proval. This is probably one of the fac- 
tors that leads toward stability in organiza- 
tions since, although the individual members 
change, the attitudes and approach would 
tend to remain the same. 

Higher management wants the lower level 
supervisors to have initiative and energy. 
The supervisors should be willing to assume 
responsibility, not only for implementing in- 
structions, but for deciding what must be 
done in order to carry out the mission of the 


244 


organization. Management is not looking for 
the old-fashioned driving kind of supervisor 
who bullies his men to do a job, nor for a 
supervisor who tries to operate on the basis 
of popularity and friendliness. The super- 
visor who respects subordinates and super- 
visors alike, and furthermore, who views his 
own self-respect and integrity as important, 
is apparently approved by management. 

It is reasonable to ask why the poor su- 
pervisor is the way he is. He does not will- 
fully try to be a poor supervisor. He per- 
sistently does the wrong thing, and this is 
possibly because he thinks these behaviors are 
expected of him. We assume that there is 
some foundation for his beliefs and that they 
arise from a misinterpretation of the expecta- 
tions of higher management. The authors 
suspect that part of the trouble arises from a 
misunderstanding of the precepts in current 
thinking about proper supervision. Among 
these precepts we might find the following: 
The good supervisor (a) has the good will of 
his subordinates, (b) does his job with intel- 
ligence and ingenuity, (c) is reliable and 
conscientious, (d) wants to succeed, (e) 
“sells” his orders rather than dictates them. 

Presumably no one would quarrel with 
these statements. A re-examination of Table 
1 shows that the poor supervisor sees himself 
as having all of these qualities and yet higher 
management does not approve. Two major 
things seem to be missing: (a) respect for 
other individuals, and (b) identification with 


E. E. Ghiselli and R. Barthol 


the job. We hypothesize that higher man- 
agement and the good supervisor are ego- 
involved in their jobs, while the poor super- 
visor views it as a way to make a living. 
We further hypothesize that current manage- 
ment training programs frequently mislead 
some supervisors by presenting human rela- 
tions as a combination propaganda and sales 
technique without making it clear that other 
human beings are involved. The dignity of 
the other is fundamental to effective human 
relations. It is perhaps not so much a tech- 
nique as an attitude. 


Received October 11, 1955. 


References 


1. Fleischman, E. A. The measurement of leadership 
attitudes in industry. J. appl. Psychol., 1953, 
37, 153-158. 

2. Ghiselli, E. E, The forced-choice technique in 
self-description. Personnel Psychol., 1954, 7, 
201-208. 

3. Ghiselli, E. E., & Brown, C. W. (2nd Ed.) Per- 
sonnel and industrial psychology. New York: 
McGraw-Hill, 1955. 

4. Halpin, A. W. The leadership behavior and com- 
bat performance of airplane commanders. J. 
abnorm. soc. Psychol, 1954, 49, 19-22. 

5. Katz, D., et al. Productivity, supervision, and 
morale among railroad workers. Ann Arbor: 
Survey Research Center, Univer. of Michigan, 
1951. 

6. Moore, J. V., & Smith, R. G., Jr. Some aspects 
of non-commissioned officer leadership. Per- 
sonnel Psychol., 1953, 6, 427-443. 

7. Newcomb, T. M. Social psychology. New York: 
Dryden, 1950. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Job Expectancy and Survival 


Joseph Weitz 


Life Insurance Agency Management Association 


Introduction 


One problem constantly plaguing research- 
ers in the area of opinion and attitude meas- 
urement is that of causality. In job satisfac- 
tion studies, for example, one is never sure 
whether workers are high producers because 
they are happy or are happy because they 
are high producers (if indeed there is any re- 
lationship). Similarly, where termination- 
survival is used as a criterion, we do not 
know if some unfavorable attitude led to 
termination or if it is a rationalization after 
termination has been decided upon. 

One method of evaluating such relation- 
ships is to study the variables experimentally. 
In some cases it is possible to introduce a 
variable into a situation and see if it has the 
predicted effect. Such is the nature of the 
study reported here. 

In an investigation (1) of job satisfaction 
of life insurance agents, we found that those 
agents who said the manager misrepresented 
the job or job possibilities during the hiring 
interview were more likely to terminate than 
those who did not agree with this statement. 
From other data, we also found that new 
agents having a realistic job concept were 
more likely to survive than those whose job 
expectancy was not as accurate. 

From these two pieces of information, the 
hypothesis was made that when potential 
agents are given a clear picture of their job 
duties, they are more likely to survive on the 
job. 


Procedure 


This study was done in one insurance company. 
A questionnaire was devised asking for the approxi- 
mate amount of time spent in each of a number of 
different job activities such as collecting, servicing, 
prospecting, selling, etc. These questionnaires were 
sent to all agents of the company with a request 
that they be completed and returned to the Life In- 
surance Agency Management Association. The Te- 
sults for each question were tallied and the median 
number of hours was computed for each activity. 


All length of service groups were combined since we 
were interested in an approximate composite picture. 

From the results of this questionnaire, a booklet 
was made up consisting of a brief introduction stat- 
ing that the hours shown for each activity in the 
booklet were approximate but should give the ap- 
plicant a fair idea of how he would be spending his 
time if he were hired for the job. The rest of the 
booklet consisted of sketches showing an agent en- 
gaged in each of the various activities, a brief de- 
scription of the activity, and the approximate num- 
ber of hours agents currently employed spent in 
each activity. 

The company supplied us with a list of their dis- 
tricts (offices), the number of agents in each office, 
the number of terminators per district, and the 
number of open debit weeks? for the preceding 
year. Matches were made by district, taking into 
account the geographical location of the district, the 
termination rate, and the number of open debit 
weeks. 

With juggling, we were able to obtain quite good 
matches, The termination rate in each group was 
the same and the average number of open debit 
weeks was 43 and 48. 

By flipping a coin, we decided which group was 
to be the experimental group and which the con- 
trol. It turned out that the group with 52 districts 
was the experimental, and the control group would 
be that containing 51 districts. All applicants in 
the experimental group would receive the job de- 
scription booklet, no one in the control group would 
receive the booklet. 

The mechanics of this for the experimental group 
was as follows: 

All applications for the job of agent go to the 
home office. In the case of those persons filing an 
application in any of the experimental districts, the 
home office would send the following letter to the 
prospective agent: 


“We recently received your application for em- 
ployment as an agent with our company and want 
you to know how pleased we are that you are con- 
sidering a career with company X. It is our feeling 
that the life insurance business offers a fine career 


1A district is composed of a number of debits. A 
debit includes a specified number of policyholders 
living in a particular geographical area (several 
blocks or miles) in which the agent is to collect 
premiums and sell. Each agent has his own debit. 
If an agent terminates, his debit is “open” until a 
new agent is hired in that district for that particu- 
lar debit, The length of time the debit is open is 
measured in what is called “open debit weeks.” Of 
course, some agents may be hired for new debits. 


245 


246 


to the man who is qualified for it. Because of this, 
our responsibility for bringing into our company 
men who have the best chance of succeeding is seri- 
ous and of prime importance. 

“Very likely you are uncertain as to whether you 
should enter the life insurance business. Similarly, 
we are also uncertain as to whether you should or 
not. To the end of fulfilling our responsibility as 
stated above, and helping you make your decision, 
we would like you to read the enclosed booklet. We 
are sending this to your home so that you can study 
it at your leisure and to give you the opportunity 
of discussing it with your family. 

“The booklet describes the job of X company’s 
agent. The company wants you to know in ad- 
vance, insofar as it is possible, exactly the kind of 
work our agents do. Frankly, if this is not the kind 
of work you want to do, we want you to find it 
out now rather than later. If it is the kind of work 
you want to do, well and good. You can discuss 
further the possibility of a position with the man- 
ager who took your application. Either way, your 
action will be based on a clear concept of our job, 
which we feel very deeply is the proper way to make 
a decision. 

“Qur best wishes go to you for a successful future, 
whether it be with our company or another organi- 
zation. 

Sincerely,” 


This letter was accompanied by a copy of the 
booklet. 

This procedure, of course, was not carried out 
with applicants in the control districts. 


Results 


The study continued for six months, start- 
ing in May and ending in October. Two 
hundred and twenty-six agents were hired 
during this period in the experimental group 
and 248 in the control. Nineteen per cent of 
the agents in the experimental group termi- 


Joseph Weitz 


nated during this period, whereas 27% of 
the control group terminated. This is sig- 
nificant beyond the 5% level using a one- 
tail test. 

More significant perhaps is the fact that 
the differences in termination rate for the 
two groups held up month after month. That 
is to say, if we determine the percentage of 
termination for each group hired in each 
month and exposed until October, we obtain 
the results shown in Table 1. 

As might be expected, the monthly termi- 
nation rate decreases as the last month (Oc- 
tober) of the study is approached. The rea- 
son, of course, is that there is less exposure of 
the men hired later in the study; that is to 
say, they have a shorter time in which to 
quit. For each month, however, it can be 
seen that a higher proportion of the control 
group terminated. Over all, there was a re- 
duction in termination of about 30%, a 
meaningful statistic to a company. 

In order to check on the possibility that 
giving a clear picture of the job to prospec- 
tive agents might make it more difficult to 
hire a man, the proportion of open debit 
weeks (how long it takes to fill a vacancy) 
was determined for the experimental and 
control groups. You might expect that if it 
were more difficult to hire a man who was 
given a clear picture of the job, the experi- 
mental group would have a higher proportion 
of open debit weeks. This was not the case. 
The experimental group had 7.8% open debit 
weeks while the control group had 8.9% open 
debit weeks for the six-month period of the 


Table 1 
Termination Rate for Persons Hired in Each Month of the Study 


Experimental Control 
NHired WN Terminated N Hired WN Terminated 
: Through To ough fo 
Hired In October Terminated October Terminated 

May 41 13 32 45 21 47 
June 32 area i 34 39 19 49 
July 28 7 25 32 10 31 
August 50 9 18 37 8 22 
September 44 2 5 42 4 10 

1 3 53 5 9 


October 31 


4 


Job Expectancy and Survival ‘ 247 


study. While this difference is not signifi- 
cant, it is opposed to the expected direction. 
We can conclude that the booklet certainly 
did not slow up the hiring procedure. 

If we examine the termination rate in the 
two groups of agents unaffected by the book- 
let, that is, those hired before the start of the 
study, we find that there is no significant dif- 
ference. There were 796 agents on the job 
in the experimental districts and 706 in the 
control districts as of the end of April. We 
determined the termination rate of these “on- 
the-job” agents during the six months of the 
study and found that in the experimental 
group 27% terminated, and 28% terminated 
in the control group. This would lend more 
weight to any differences we find in the 
groups of agents involved in the study since 
apparently our earlier matches held up. All 
in all it appeared that something was effec- 
tive. 


Discussion 


The reason we say it appeared that some- 
thing was effective, rather than the job de- 
scription booklet, is this. The home office 
contact, via the letter accompanying the 
booklet, may have been part of the reason 


the system worked. This procedure perhaps 
created a favorable impression and resulted 
in higher survival in the experimental group. 

This variable could be controlled in further 
studies by issuing the booklet at the point of 
application (but would the manager issue the 
booklet?), or by having the home office send 
out a “public relations” letter to applicants 
without mentioning the job description. 

There are always many things you would 
like to do to “purify” your findings. One 
must not, however, in industrial work, purify 
to the point of sterilization. 


Conclusion 


We feel that this study shows that giving 
prospective agents a realistic concept of the 
job and having this description come from an 
“executive” source will reduce termination. 
We further found that this procedure will not 
make it more difficult to hire new agents. 


Received March 2, 1956. 


Reference 


1. Weitz, J, & Nuckols, R. C. Job satisfaction and 
job survival. J. appl. Psychol, 1955, 39, 294- 
300. 


tested on two occasions. 
during the first week of their preflight training at 
Lackland Air Force Base. 
cadets, and others, were tested one year later when 
in advanced training. 


ates Air Force under contract 
monitored by the Crew Res 
Force Personnel and Training 
dolph Air Force Base, Randol 
mission is granted for reproduction, translation, pub- 
lication, use and disposal in whole 
or for the United States Government. 

2 At the time of this study, the first author was at- 
tending the University of Chicago, and the third 
author Northwestern University. 

3The authors had the advan 
with a previous scoring 
Desmond Cartwright. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Use of a Sentence Completion Test in Measuring 
Attitudes Toward Superiors and Subordinates * 


Leroy S. Burwen 
Research Division, Chicago Tribune 
Donald T. Campbell 
Northwestern University 
and Jerry Kidd * 

Ohio State University 


This paper reports on an effort to use a 
sentence-completion test to measure attitudes 
toward superiors and subordinates which 
might help predict the behavior of an inter- 
mediate in the face of conflicting demands 
from those who supervise him and those 
whom he supervises. 


Method 


1. 


The data were collected from Air Force cadets, 


Some of these same 


The test booklet asked the respondent to “Finish 


the following sentences with the first thought that 
occurs to you. There are no right or wrong an- 
swers. Work as rapidly as possible.” Of the items 
in the test, 24 were deemed scorable for all re- 
spondents on the dimension of superior-subordinate 
orientation. In general, responses favorable to su- 
periors, favoring strict discipline, and the like have 
been interpreted as superior-oriented.3 The responses 
were rated on a five-point scale. 
the nature of the test and its scoring, the 24 items 
are presented here, along with paraphrased responses 
received from the tested population and their rated 
values, These responses have been selected to indi- 
cate the range of responses made, as well as the 
scoring system, 


To convey better 


The initial testing was 


10, 


1 This study was supported in part by the United 


t No. AF 18(600)-170 
earch Laboratory, Air 12. 
Research Center, Ran- 
ph Field, Texas, Per- 


and in part by 
13. 


tage of experience 18. 
g of the Lackland tests by 


248 


In comparing civilian life with army life, he felt: 
restricted in his freedom (2) 

more secure (4) 

content and safe (4) 


. When the commanding officer called him, he 


thought: 
I have fouled up and will get chewed (2) 
general curiosity (3) 
I will be commended or given a pass (4) 


. He liked to be with a leader who: 


was easy-going, was well liked (2) 
knew his job (3) 

knew how to get the work out (4) 
was efficient and strict (4) 


. He never felt comfortable in the presence of: 


his superiors (1) 
a general (2) 
his men (5) 
enlisted men (5) 


. He felt the men over him were: 


boneheads (1) 
harsh (2) 

good leaders (4) 
the best (5) 


. The main trouble with the Air Force is: 


too many chiefs and not enough Indians (2) 
not strict enough discipline (4) 


Whenever he saw his superior coming he: 
threw up (1) 

ducked or lied (2) 

saluted (3) 

gave him a warm greeting (4) 

was very happy (5) 


The way to get along in the Air Force is: 
give the guys around you an even break (2) 
work hard (3) 

respect your C.O. (4) 


He thought the men under him were: 
the best (1) 
foul balls (5) 


When giving orders to an enlisted man: 
he was kind and understanding (1) 
he expected immediate compliance (5) 


Use of Sentence Completion Test in Attitude Measurement 


20. He thought the tough C.O.: 
was psycho (1) 
was the best (5) 

24. In an argument with a superior: 
never tell him what you think (2) 
keep your place (4) 
be respectful (4) 

25. The average enlisted man: 
is a good Joe (1) 
has a poor deal (2) 
is a slacker (4) 
shows no respect (5) 

26. The difference between an enlisted man and an 

officer is: 
just rank, breaks, no difference (2) 
hard work, intelligence, quality (4) 

28. When the officer pulled rank on him: 
he thought, what a jerk (1) 
he accepted it gracefully (5) 

30, The lot of an enlisted man: 
isn’t too hot (2) 
is just what he makes it (4) 

31. The difficulty in being an officer is: 
you can’t associate with enlisted men (1) 
keeping the men in line (4) 

32. The status or rank system in the service makes 

for: 
injustice (2) 
better morale, better work (4) 

33. When ordered to do something: 
he wanted to know why (2) 
he hopped to it (4) 

34. What his men liked most about him was: 
he understood their side (2) 
he was firm (4) 

35. A poor officer is one who: 
is “chicken” (2) 
is too slack (4) 

37. The men under him disliked: 
the C.O. (2) 
his conduct as an officer (4) 

41. When bucking for a promotion: 
don’t step on the other guy’s back (2) 
make sure the right people see you (4) 

43. Military regulations: 
are for the birds (1) 
are an absolute must with me (5) 


The ratings of these items had an interjudge re- 
liability of .89, The internal consistency reliability, 
as measured by a variant of the Kuder-Richardson 
formula (6, p. 223) is 69. For 48 men who had 
taken the test both in 1953 as preflight cadets, and 
in 1954 at the advanced training bases, the test- 
retest correlation was .12. 


Results 


At the time the sentence-completion tests 
were administered in 1954, a considerable 


249 


variety of reputational criterion measures 
were also obtained, some from administrative 
records, such as grades and Military Apti- 
tude Ratings, and others from nominations 
data collected by the project. The character 
and interrelationships among these criteria 
have been reported elsewhere (3). Correla- 
tions with all 13 of the available criteria were 
nonsignificant. All but one were below .06, 
The highest value was .13 with Flying Train- 
ing Grade based on an WN of 225. This al- 
most reaches significance at the .05 level. 

The Sentence Completion score has also 
been correlated with other attitude measures 
included in the 1954 testing. Its correlation 
with the Leadership Knowledge (5) attitude 
score is .27, based on an W of 312, significant 
beyond the .001 level. The correlation with 
the Superior-Subordinate Cluster (4) is .32; 
with the Alienation cluster (4), — .45; with 
the F scale (1), .01; and with Identification 
with Discipline (4), .25. All values above 
-19 are significant beyond the .001 level. 


Discussion 


The values of .27 with Leadership Knowl- 
edge, and .32 with Superior-Subordinate clus- 
ter, taken with the correlation of .47 between 
the latter two (5), complete a triangulation 
which supports the “construct validity” of all 
three. The significant correlation of Sentence 
Completion and Superior-Subordinate Orien- 
tation Cluster with Identification with Disci- 
pline augments this picture. The high nega- 
tive correlation with the Alienation cluster 
does not help, however, since all but the Sen- 
tence Completion Test are independent of it. 
And, of course, the picture of construct va- 
lidity is weakened by the total absence for 
two of the three tests (Sentence Completion 
and Leadership Knowledge) of significant cor- 
relations with reputational measures intended 
to get at the same dimension. 


Summary 


A sentence-completion test designed to 
measure attitudes toward superiors and sub- 
ordinates was administered to 312 Air Force 
cadets in advanced training. The test was 
scored with acceptable reliability, and showed 


250 


a correlation of .32 with a direct attitude 
measure of the same dimension, and of .27 
with an indirect measure based on an infor- 
mation test. Interpretation of these values 
is restricted due to a correlation of — .45 with 
a direct scale of alienation, and the absence 
of significant correlations with reputational 
criterion measures, 


Received October 13, 1955. 


References 


1, Adorno, T. W., Frenkel-Brunswik, Else, Levin- 
son, D. J., & Sanford, R. N. The authori- 
tarian personality. New York: Harper, 1950, 


Leroy S. Burwen, Donald T. Campbell, and Jerry Kidd 


2. Campbell, D. T. The indirect assessment of so- 
cial attitudes. Psychol. Bull., 1950, 47, 15-38. 

3. Campbell, D. T. Intercorrelations among leader- 
ship criteria on a population of Air Force 
cadets. Unpublished draft research report 
submitted for monitor’s approval, Jan., 1955. 

4. Campbell, D. T., Burwen, L. S., & Chapman, 
J. P. Assessing attitudes toward superiors 
and subordinates through direct attitude state- 
ments. Unpublished draft research report 
submitted for monitor’s approval. Jan., 1955. 

5. Campbell, D. T., & Damarin, F. Measuring lead- 
ership attitudes through an information test, 
Unpublished draft research report submitted 
for monitor’s approval. Jan., 1955. 

6. Gulliksen, H. Theory of mental tests. 
York: Wiley, 1950. 


New 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


A Validation Study of the Prediction of College Achievement 


J. W. Frick 
University of Southern California 


and Helen E. Keener 
University of California, Santa Barbara College 


The senior author has previously presented 
the findings of an attempt to improve the pre- 
diction of college freshman academic achieve- 
ment by use of the Minnesota Multiphasic 
Personality Inventory (MMPI) in conjunc- 
tion with the usual scholastic aptitude test (1). 
The results indicated that the coefficient of 
determination derived from a weighted com- 
posite of aptitude and personality scores was 
41, while that derived from aptitude scores 
alone was only .23, A regression equation 
was given for the prediction of grade-point 
average (GPA), based on an experimental 
group of 267 freshman women at the Univer- 
sity of California, Santa Barbara College. 

This study presents the results of an at- 
tempt at cross validation of the previous find- 
ings, in an additional sample of 200 freshman 
women at the same institution. 


Method 


The cross-validation group was selected on the 
same basis as the experimental group, e.g., comple- 
tion of the two semesters of the freshman year, a 
score on the American Council on Education Psy- 
chological Examination (ACE), and a research-valid 
score on the MMPI. Means and standard deviations 
on the relevant variables for the two groups were 
computed and compared (Table 1). Distributions 
and scatter plots of the scores on all variables for 
the validation group indicated a similarity to those 
found for the experimental group, with symmetry 
and linearity in all scales except GPA and the D 
scale of the MMPI. As was done with the experi- 
mental group, these two scales were normalized by 
T-scaling. We 

Scores on each predictor variable for each indi- 
vidual of the validation group were then inserted 
into the following previously obtained regression 
equation, derived from the experimental group: 

X’ = 79.6 + 1476 ACE — 5490 Hs— .0125 D 

— 9012 Pd + 1.0127 Pa — 4592 Sc— 4523 Ma. 
In this equation, X’ represents the predicted GPA 
(in T scores) for the individual, while the terms on 
the right side represent the sum of a constant plus 
weighted individual scores on the ACE and selected 
MMPI scales. 

After a predicted GPA had been computed for 
each individual, these predicted scores were corre- 


lated with obtained GPA’s for the entire group of 
200. Since the correlations between GPA and the 
other variables had been corrected for errors of 
measurement in the criterion (GPA) in the experi- 
mental group, this same procedure was followed for 
the cross-validation group. While the reliability of 
the GPA for the two semesters was known for the 
experimental, it was not available for the validation 
group, and was therefore estimated by the method 
suggested by Guilford (2). Since GPA had been T 
scaled for both groups, the SDs were identical, and 
therefore the estimate of reliability (corrected for two 
semesters) was the same for both groups. 

The zero-order correlation between GPA and ACE_ 
in the experimental group was found to be .48. 
These two measures were also correlated in the 
cross-validation group for purposes of comparison. 


Results 


The original multiple correlation between 
GPA and a weighted composite of ACE and. 
MMPI scales, in the experimental group, was 
found to be .64. In the cross-validation 
group, the correlation between predicted and 
obtained GPA’s was .54. In the same group 
the zero-order correlation between GPA and 
ACE was .50. In this group, therefore, the 
coefficient of determination of .25 yielded by 
the ACE scores alone is slightly higher (.02) 
than that found in the experimental group, 
while the coefficient of multiple determina- 
tion of .30 is considerably (.11) lower. The 
shrinkage in correlation from the experimen- 
tal to the validation groups between the pre- 
diction variables and the criterion appears to 
have been subject to the regression phenome- 
non common to predictive equations derived 
by the least-squares method, and probably 
magnified by the use of seven independent 
variables in the equation. 

It will be noted (Table 1) that the mean 
of the predicted GPA for the cross-validation 
group is quite close to the obtained mean, 
while means for the prediction variables in 
both groups are similar enough to warrant the 
assumption that both samples were selected 
from the same population. 


251 


J. W. Frick and Helen E. Keener 


252 
Table 1 
Means, Standard Deviations, and GPAs, Both Groups 
imental Cross-Validation GPA 
es $ FR Grott (Cross-Validation Group) 
Predicti 
varani Mean SD Mean SD Meanpredicted Meanobtained SDpredicted  SDobtained 
ACE 47.3 27.74 48.0 26.45 
Hs 13.0 2.93 13.2 2.87 
D* 50.0 10.0 50.0 10.0 
(18.4) (4.34) (18.2) (3.22) 
På 21.0 3.32 21.2 3.61 
Pa 9.1 2.52 9.4 3.12 
Se 25.0 4.32 25.4 4.31 
Ma 18.1 3.75 19.1 4.02 
GPA* 50.0 10.0 50.0 10.0 49.4 50.0 7.27 10.0 
(1.319) (.505) (1.317) (.560) 


* Normalized by T scaling, Original means and standard deviations in parentheses, 


Discussion 


Inspection of the data indicates that in 
some cases there were deviations in obtained 
GPA from that predicted, without concomi- 
tant aberrant scores in any of the prediction 
variables, This was especially true of a group 
of 50 “problem” students, whose prediction 
scores were within the normal range but 
whose performance as measured by GPA was 
extremely poor. Possibly certain personality 
variables exogenous to the areas measured by 
the MMPI are responsible for this deviation. 
The authors would hazard a guess that these 
deviant achievement scores arose from the in- 
ability of some freshman women to adjust to 
the college routine, social difficulties, and 
various other frustrations to which the col- 
lege woman, more than the college man, ap- 
pears to be prone. It is also possible that 
these difficulties had not yet made their pres- 
ence known at the time of matriculation, 
when the ACE and MMPI were administered, 
and therefore did not enter into the predictive 
measures. Since such deviates appear in most 
college populations, however, the authors did 
not feel justified in excluding this particular 
group from the study. 

The expected shrinkage of the multiple- 
correlation coefficient from the experimental 
group to the validation group may be at- 
tributed to (a) the regression phenomenon, 
aggravated by the use of seven prediction 
variables; (b) sampling errors in either or 
both groups; (c) the tendency of the least- 
squares method of computation of the multi- 


ple-correlation coefficient to exploit any chance 
relationships present; (d) one-fourth of the 
cross-validation group being widely deviant 
in performance but not in the predictive vari- 
ables. 

Summary 


1. A regression equation derived from the 
ACE and six clinical scales of the MMPI 
in an experimental group of 267 freshman 
women at the University of California, Santa 
Barbara College, was applied in the predic- 
tion of GPA to a similar cross-validation 
group at the same institution. 

2. The multiple-correlation coefficient be- 
tween prediction variables and GPA in the 
experimental group was .64. The correlation 
between predicted and obtained GPA in the 
cross-validation group was .54. Both coeffi- 
cients were corrected for errors of measure- 
ment in the criterion, with the reliability of 
the criterion estimated as the same for both 
groups. 

3. The shrinkage in the coefficient of de- 
termination from the experimental to the vali- 
dation group can be attributed to the regres- 
sion phenomenon, sampling errors, and the 
influence of variables not measured by the 
prediction scales. 


Received August 22, 1955. 


References 


1, Frick, J. W. Improving the prediction of aca- 
demic achievement by use of the MMPI. J. 
appl. Psychol., 1955, 39, 49-52. 

2. Guilford, J.P. Psychometric methods. 
New York: McGraw-Hill, 1954. 


(2nd Ed.) 


a 


The Journal of LUSS Psychology 
Vol. 40, No. 4, 1956 


Predicting Grade-Point Average with a Forced-Choice 
Study Activity Questionnaire 


Genevieve Schutter and Howard Maher 
Iowa State College 


The importance of study skills and atti- 
tudes in academic achievement has been cited 
by counselors and by those responsible for 
study methods courses. Many attempts at 
measurement of study skills and attitudes 
have been made, A seeming weakness of some 
study tests is item transparency. Where 
items can be answered either yes or no or to 
some degree of applicability, the respondent 
can, if he wishes, indicate that all favorable- 
appearing items apply to him while denying 
unfavorable-appearing items. For situations 
such as college admission, required attend- 
ance in study correction courses, and others 
in which an important outcome may be partly 
determined by a study test score, it would 
seem desirable to have a test not so depend- 
ent upon honesty. For example, Holtzman 
and Brown (2) report a study questionnaire 
consisting of both skill and attitude items. 
While they obtain mean validity coefficients 
of .42 and .45 for men and women, respec- 
tively, their test manual includes the authors’ 
opinion that the predictive validity of the in- 
strument might be affected by the students’ 
desire to do well on the test. Scates (7), in 
a review of Wrenn’s Study Habits Inventory, 
raises another transparency issue. He notes 
that there are easy “outs” for the student. 
The student can check a large number of 
mechanical or external reasons for low grades, 
thus establishing a façade problem for the 
counselor. 

Another area of measurement that has been 
plagued by transparency is personnel rating. 
The forced-choice technique has been demon- 
strated more or less to control the biased re- 
sponse sets usually found with rating scales 
having readily apparent answers (8). Since 
the forced-choice technique appears success- 
ful in reducing transparency and since there 
is obviously little limit to the biasing possible 
on the usual free choice test, the present 
study proposes to investigate the use of the 


forced-choice technique in the study test con- 
text. 


Procedure 


As a first step in the development of the test, 600 
phrases and statements were selected from “How I 
Study” essays written by 150 sophomore students. 
To keep assignments within a reasonable limit, 300 
of these items were randomly selected for use in the 
present study. Next, the 300 items were classified 
independently by six expert judges (three psychol- 
ogy department staff members and three graduate 
students with experience in counseling) into Skill 
and Attitude categories. Items were accepted into 
the categories when agreed upon by at least five of 
the six judges. As a final step, at this stage of con- 
struction, 99 freshman and sophomore students in 
psychology classes were asked to indicate the ex- 
tent to which each statement described them on a 
scale of five degrees. 

Of the students, 50 were overachievers academi- 
cally and 49 were underachievers. They were se- 
lected as follows. A regression line to predict two- 
quarter grade-point average from ACE-L score was 
drawn. This line was based on data from entering 
students in 1953. A scatter plot of students’ ACE-L 
and grade-point averages was made, and the 50 stu- 
dents who were at least seven-tenths of a standard 
error of estimate above the regression line were se- 
lected as overachievers. The 49 underachievers were 
at least seven-tenths of a standard error of estimate 
below the regression line. 

The mean response (on the five-point scale) for 
each item was computed for the high and low 
groups. The algebraic difference between the mean 
of the high group and the mean of the low group 
for each item was designated the discrimination 
index, The mean response of the two groups com- 
bined was designated the preference index for that 
item. 

Thirty blocks of five statements each were as- 
sembled using the preference and discrimination in- 
dices and Skill and Attitude categories, Of the five 
statements, two were equally favorable in appear- 
ance as determined by equal preference values. One 
of these, having a comparatively large discrimination 
index in favor of the high group was designated the 
Favorable Valid Statement; the other having a 
smaller discrimination index was designated the Fa- 
vorable Nonvalid Statement. Two more of the 
statements were equally unfavorable in appearance, 
determined by equally low preference values. One 
of these had a comparatively large negative dis- 


253 


254 


crimination index and was designated as the Unfa- 
vorable Valid Statement. The other had a smaller 
discrimination index and was designated as the Un- 
favorable Nonvalid Statement. The distance be- 
tween the Valid and Nonvalid for both favorable 
and unfavorable statements was never smaller than 
five-tenths of a discrimination index, a number of 
preliminary tests having indicated that for item pairs 
chosen randomly this difference was either statisti- 
cally significant (.05) or closely approached signifi- 
cance, The fifth statement was designated as the 
Neutral Statement and had medium preference index 
and medium discrimination index. It may be recog- 
nized that this procedure constitutes the Richardson 
forced-choice system as described by Highland and 
Berkshire (3). 

In addition to the above method of fitting alter- 
natives into blocks according to their validity it was 
necessary to devise a scheme of arranging the Atti- 
tude-Skill dimensions within the blocks. It was de- 
cided to make this arrangement so that of the two di- 
mensions (Validity—Nonvalidity and Attitude-Skill) 
only one would vary at a time. Thus, the subject 
would be required to operate in only one dimension 
at a time. 

Nineteen blocks were arranged according to the 
scheme in Table 1. 

Seven of the blocks were Form A; 12 were Form 
B. An additional eight blocks were constructed so 
that the Favorable and Unfavorable couplings were 
mixed with respect to S or A. These blocks were 
scorable for total score but not for S or A score. 
Within blocks the statements were arranged alpha- 
betically according to the first letter of the first 
word of the statement, Again, once all blocks were 
constructed, they were randomized for order of ap- 
pearance on the questionnaire by means of tables of 
random numbers. 

The instructions for taking the test required the 
students to pick two statements from each block: 
the one most like their study attitude or practice 
and the one least like their study attitude or prac- 
tice. 

Three groups of freshman and sophomore subjects 
designated A, B, C, were selected from psychology 


Table 1 
Scheme for 19 Blocks 
Block Form A Block Form B 
Validity Area Validity Area 
Fv* A FV S 
FN A FN S 
N A/S/NA N A/S/NA 
UV S UV A 
UN S UN A 


*FV = Favorable Valid; FN = Favorable Nonvalid; N = 
Neutral; UV = Unfavorable Valid; UN = Unfavorable Non- 
valid; A = Attitude; S = Skill; NA = No Area. 


Genevieve Schutter and Howard Maher 


classes. Groups A and B were used as validation 
groups. Group C was retained as the cross-valida- 
tion group. Twenty-six per cent of group A, 19 
per cent of group B, and 24 per cent of group C 
were from the original criterion groups which filled 
out the earlier questionnaire. Each group had 50 
students who were below or on the regression line 
to predict grade point from ACE-L score and 50 
who were above, making 100 students in each group 
in a continuous grade-point distribution. The three 
groups were selected so that within each group in- 
dividuals were matched on ACE-L score, sex, and 
distance above and below the regression line. As 
much as possible, individuals were also matched 
among the three groups on the same variables, That 
matching was effective is shown by the fact that, 
by F-ratio test, there were no significant differences 
among the samples A, B, and C as regards grade- 
point average, ACE-L score or interactions. At the 
same time, within groups there was a significant dif- 
ference on grade-point average but not on ACE-L 
Score. 

The forced-choice answer sheets for groups A and 
B were used in making the scoring key. The first 
step was to identify statements more often chosen 
by overachievers and, conversely, those more fre- 
quently chosen by underachievers. Consequently, 
for each statement the percentage of persons in the 
upper half of group A who chose it as most like 
them and again as least like them was obtained, as 
was the percentage of persons in the lower half of 
group A choosing it as most and, again, as least like 
them. This same procedure was followed for the 
upper and lower halves of group B. The critical 
ratios of the differences of these percentages of upper 
and lower halves of group A were read from Mosier 
and McQuitty’s nomograph (4). The same pro- 
cedure was followed for group B. These two sets of 
critical ratios were transformed to probabilities and 
the two sets of probabilities combined into a com- 
pound probability via Baker’s Tables (1). Weights 
were assigned to alternatives in the blocks according 
to the following scheme: 


y Compound 
Weight Probability 

il 2% < P% < 5% 

2 1% < P% < 2% 

3 0% < P% < 1% 


Finally, the ten least valid blocks, in terms of num- 
ber of differentiating statements were discarded. 
Positive weights were assigned when the choices of 
an alternative were greater for the upper group, 
negative weights in the opposite condition. 

In an effort to determine the relative importance 
of the mechanics of studying and attitudes toward 
study, the answer sheets for group C were scored 
twice again. One scoring was made for the Skill 
statements and the other for the Attitude statements. 


Predicting Grade-Point Average 


Results 


Reliability. The assumption of homoge- 
neity of blocks was not warranted since the 
blocks had different total weights and differ- 
ent proportions of Skill, Attitude, and No 
Area statements. In an effort to obtain more 
equivalent subtests, a modified form of the 
odd-even method of computing reliability was 
used. The blocks were ranked according to 
weights in the three categories of Skill, Atti- 
tude, and No Area; the odd-even blocks of 
this ranked order constituted the two subtests. 
The reliability coefficients were found to be 
.62, .70, and .72 and when stepped up by the 
Spearman-Brown formula were .76, .82, and 
.83 for groups A, B, and C respectively. 

Validity. The validity of the test was com- 
puted by means of a Pearson product-moment 
coefficient of correlation between test score 
and the cumulative grade-point average for 
fall and winter quarters. These coefficients 
were .58, .51, and .36 for groups A, B, and 
C, respectively. As previously indicated, 
weights for the scoring key were based upon 
the responses of groups A and B. Group C, 
the cross-validation group, shows the usual 
effects of shrinkage when the test is applied 
to a new sample. Even so, the shrinkage 
would appear to be slight and the coefficient 
of .36 remains significantly different from 
zero. For N = 100 an r of .25 is significantly 
different from zero at the .01 level. 

An important consideration in the validity 
of the study test is its relationship to intelli- 
gence, It will be recalled that, in the con- 
struction of the test, an effort was made to 
match high and low scholastic groups on 
ACE-L score. It was hoped that this match- 
ing would result in a predictive validity for 
the study test independent of intelligence. 
For group C, the cross-validation group, the 
correlation of ACE-L score with grades is .41, 
the study test validity is .36, and the inter- 
correlation of grades and test score is only 
07, From these figures it would seem that 
the efforts to control for intelligence were 
successful and that the two tests are prac- 
tically independent measures. Some increase 
in this low intercorrelation might be antici- 
pated for future samples where controls for 
ACE-L score are not exerted. For the pres- 


255 


ent group C, however, a combination of the 
two tests predicts grades better than either 
one alone. The two zero-order correlations 
when combined show coefficient Riss = .53, 
where: 


1 = Cumulative grade-point average 
2 = ACE-L score 
3 = Forced-choice Study Test 


Thus, while the best single predictor is the 
ACE-L score, the addition of the study test 
raises the prediction by .12 over that of 
ACE-L alone. It would appear, therefore, 
that the use of the study test would make for 
prediction of scholastic achievement over and 
above that obtainable with ACE-L score 
alone. 

Skill and attitude components of the test, 
Since the test was composed of two distinct 
types of items, the skills or mechanics of 
study, and attitudes toward study, it is ap- 
propriate to investigate whether one appeared 
to be more valid than the other. As previ- 
ously indicated, Skill or Attitude alternatives 
were scored only if the other member of the 
pair was in the same (Skill or Attitude) area. 
A coefficient of correlation of .59 between 
scores on the 12 Skill pairings and the 14 
Attitude pairings suggests that the two types 
of items tend to vary together, and that a 
high score on Attitude is likely to be asso- 
ciated with a high score on Skills, In terms 
of comparative validity also, there would ap- 
pear to be little choice. The correlations 
with grades are found to be .28 and .23 (sig- 
nificant at .05) for Attitude and Skill scores, 
respectively. From all data taken together it 
appears that both Skill and Attitude state- 
ments are about equally valid, but are not 
making independent contributions to the total 
validity. 

Other relationships. Are there significant 
score differences between men and women 
and between freshmen and sophomores? To 
determine the extent of relationship of sex 
and class membership with scores, two point- 
biserial correlations were computed. The 
point-biserial coefficient of correlation of test 
score with sex is .13, which is not significantly 
different from zero at the .05 level. It would 
appear that, although no attempt was made 


256 


to control for sex item differences at earlier 
stages of development, no total score sex dif- 
ferences have been introduced into the test. 
The point-biserial coefficient of correlation 
of test score with class (r = .14) was not sig- 
nificantly different from zero at the .05 level. 
The test is thus probably suitable for use 
with either freshman or sophomore groups. 


Discussion 


As indicated previously, the shrinkage from 
validation samples to the cross-validation 
sample appears slight. However, the cross 
validity, in terms of forced-choice scales, is 
somewhat disappointing. Moreover, consid- 
ering the individual blocks, the results are 
not up to par. The probability weights are 
generally below those found with other 
forced-choice scales. Again, the attrition of 
blocks from the questionnaire to the forced- 
choice stage is rather great, 8 of the 30 blocks 
tested carrying no weight for any of the al- 
ternatives. And of the 20 blocks accepted 
for the final form, 3 carry only one weight. 

We are led to the speculation that some of 
this washing out of blocks may be occasioned 
by the nature of the criterion used to select 
block alternatives. In the first place, the ex- 
perimental design poses as a target only that 
portion of the criterion variance not ac- 
counted for by ACE-L score. Furthermore, 
a continuous distribution of under- and over- 
achievement was used as the criterion. Lesser 
prediction might be expected in that some of 
the discrepancy between predicted and actual 
achievement lies close enough to the regres- 
sion line to be merely a “chance” miss. 

Nevertheless, in spite of lessened validity, 
the experimental design is still, in the opin- 
ion of the authors, the preferred one. Fail- 
ure to contro] for intelligence in the criterion 
may lead to a loading of a study test with 
intelligence items. Again, even though vali- 
dation on continuous rather than extreme 
group criteria may have led to greater block 
attrition, surely, in everyday application of 
a test, the convenience of extreme groups can- 
not be expected. 

Another disturbance may have arisen from 
the method used to assemble the forced-choice 
blocks. The rather small difference between 
the discrimination indices of the valid and 


Genevieve Schutter and Howard Maher 


nonvalid statements was employed in an at- 
tempt to reduce the transparency of the 
blocks. This, conceivably, could also reduce 
the validity of the blocks by “splitting the 
votes” of both good and poor students be- 
tween the valid and nonvalid alternatives. 
Probably, in future investigations, basic re- 
search is needed to indicate the optimal dis- 
crimination distances for validity, reliability, 
and nontransparency. 

Additional reduction in validity may have 
been produced by our attempt to get the atti- 
tude and skill measurements into the scale. 
Too few items were sufficiently agreed upon 
by judges to permit enough blocks to be as- 
sembled under the demands of forced-choice 
construction, i.e., discrimination, preference, 
and, in this case, attitude and skill require- 
ment. Probably a greater number of items 
should have been tried in the original ques- 
tionnaire. 

With all of the above, however, the test as 
it now stands has, at least, usable validity. 
Should the low interrelationship with intelli- 
gence test score hold for future samples, the 
test could be expected to add to the multiple 
prediction of college achievement. Further- 
more, the investigation has demonstrated the 
applicability of the forced-choice instrument 
in, to the best of our knowledge, a new area. 
There is an indication in the data that the 
same techniques used in the construction of 
forced-choice rating scales and tests will 
carry over in this context. For instance, the 
pairing of alternatives on discrimination index 
appeared helpful. Of the 31 weighted alter- 
natives in the final scale, 26 were the dis- 
criminating items when the original question- 
naire was analyzed. This would seem to in- 
dicate that the computation of discrimination 
indices is an important step in design of the 
forced-choice block. That it is not a com- 
pletely sufficient step is evidenced by the 
fact that, had all valid statements been 
scored as originally weighted, at least 40 
statements would have been weighted, i.e., 
FV and UV statements in each of 20 blocks. 
Also had these been weighted for both most 
and least descriptive responses, there would 
have been 80 scorable statements instead of 
the 31 found to be differentiating. 

The data do show, however, one difference 


> a 


Predicting Grade-Point Average 


between forced-choice rating scales and study 
tests. In the rating situation the Neutral 
Statement has sometimes functioned as a 
suppressor statement (6), i.e., a favorable- 
appearing item with negative weight for a 
“most descriptive” response or one unfavor- 
able in appearance weighted negatively when 
denied. Only one Neutral Statement served 
as a suppressor in the present investigation 
and carried only unitary weight. It would 
seem likely, therefore, that the suppressor is 
not functional in this context. A possible 
reason lies in the nature of the students’ re- 
sponses; it is clear that they do not resist un- 
favorable alternatives to the extent that raters 
do. These latter deny favorable alternatives 
or admit unfavorable ones less frequently 
(5). In this case 32 per cent of the responses 
are of this nature. Richardson (6) has con- 
tended that the suppressor works by allowing 
the rater to “damn with faint praise.” The 
underachieving student, in describing himself, 
seems to be praising with loud damns, thus 
finding little use for suppressors. 

Finally, the alternatives discriminating good 
from poor students are of some interest. The 
good student generally characterizes himself 
as using most of the techniques recommended 
in methods courses—i.e., trying to distinguish 
important from unimportant points, survey- 
ing before studying, time budgeting, etc. He 
would also appear to have a singleness of 
purpose and a serious attitude toward grades. 
The underachiever, on the other hand, would 
seem to have primarily serious motivational 
difficulties. Secondly, he marks a cluster of 
alternatives indicating that strong social in- 
terests operate to the detriment of his grade- 
point average. In fact, whereas the over- 
achiever would appear to have both high 
motivation and efficient technique, the poor 
student would appear handicapped mainly in 
terms of motivation. 


Summary 


In an attempt to introduce the lesser trans- 
parency of forced-choice technique into the 
study test area, preference and discrimina- 
tion indices were first computed from the re- 
sponses of 99 over- and underachievers to 
300 attitude, skill, and unclassified items. 


257 


Thirty Richardson-type forced-choice blocks 
were next submitted to 300 students. Two 
groups of 100 students each were used as 
item validation groups. The cross-validity, 
obtained on a third group of 100 students 
was found to be r = .36, while the corrected 
odd-even reliability was r= .83. Skill and 
Attitude statements appeared to contribute 
about equally to the validity. The total test 
scores did not correlate significantly with sex 
or class (year) membership. 

Discussion of the results has centered about 
possible reasons for the lesser validity of the 
forced-choice technique in this test as com- 
pared with other areas. The use of a continu- 
ous distribution of discrepancy between ACE- 
L score and college achievement, the method 
of assembling the forced-choice blocks, and 
the reduction of the usable population of 
items by attempts to satisfy many require- 
ments are advanced as hypotheses. Again, 
differences between forced-choice responses 
on rating scales and the test are examined, 
leading to the hypothesis that suppressor al- 
ternatives may not be functional in the lat- 
ter. Finally, the data provide word pictures 
of under- and overachieving students. 


Received October 31, 1955. 


References 


1. Baker, P, C. Combining tests of significance in 
cross validation. Educ. psychol. Measmt, 
1952, 12, 300-306. 

2, Brown, W. F., & Holtzman, W. H. Brown-Holtz- 
man SSHA manual. New York: Psychologi- 
cal Corporation, 1953. 

3, Highland, R. W., & Berkshire, J. R. A meth- 
odological study of forced-choice performance 
rating. USAF Hum. Resour. Res. Cent, Res. 
Bull., 1951, No. 51-9. 

4. Mosier, C. I, & McQuitty, J. V. Methods of 
item validation ABACS for item test correla- 
tion and critical ratio of upper-lower differ- 
ence. Psychometrika, 1940, 5, 57-65. 

5. Richardson, M. W. An empirical study of the 
forced-choice performance report. Paper read 
at Amer. Psychol. Ass., Denver, Sept., 1949. 

6. Richardson, M. W. Forced-choice performance 
reports; a modern merit-rating method. Per- 
sonnel, 1949, 26, 205-210. 

7. Scates, D, E. Study habits inventory. (Review) 
In O. K. Buros (Ed.), The third mental meas- 
urements yearbook. New Brunswick, N. J.: 
Rutgers Univer. Press, 1949, 566-568. 

8. Sisson, E. D. Forced-choice—the new army rat- 
ing. Personnel Psychol., 1948, 1, 365-381. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Fakability of a Forced-Choice Personality Test Under 
Realistic High School Employment Conditions 


Leonard V. Gordon 
U. S. Naval Personnel Research Field Activity, San Diego + 


and Ernest S. Stapleton 
Albuquerque Public Schools 


The relative ease with which the job ap- 
plicant can falsify his responses in the con- 
ventional personality questionnaire has raised 
doubt as to the practical utility of this type 
of instrument in employment situations. It 
has been hoped that the forced-choice tech- 
nique would reduce, to some extent, the abil- 
ity or tendency of the applicant to give a 
more favorable impression of himself under 
these circumstances. 

Two general approaches have been used in 
the development of forced-choice personality 
tests. In one, only certain items are keyed 
in and it is assumed that the applicant who is 
out to make a good impression will have diffi- 
culty in determining what these items are. 
In the other, all items are keyed in, but on 
different scales. The applicant who is moti- 
vated to falsify would be faced with the task 
of identifying these scales and deciding which 
scales are important for the job. 

Longstaff and Jurgensen (3) have reported 
a fakability study on the Classification In- 
ventory, a forced-choice test which uses the 
first of these approaches. A group of stu- 
dents were asked to assume that they were 
applying for a job when taking the Inven- 
tory. At the next class meeting the Inven- 
tory was taken again with the students asked 
to assume that they were applying for voca- 
tional guidance. Means on the self-confi- 
dence key were not significantly different for 
the two administrations, and a correlation 
between scores of .50 was obtained. 

Rusmore (4) used the same design with 
the Gordon Personnel Profile, a forced-choice 
test that has each item keyed in on one of 
four different scales. Means on three of the 

1The opinions and conclusions expressed herein 


do not necessarily reflect the opinions of the Chief 
of Naval Personnel or the Department of the Navy. 


scales were not significantly different between 
the simulated job applicant and guidance ad- 
ministrations. Significant mean differences, 
in favor of the job applicant administration, 
were obtained on the Responsibility scale 
and on the Total score, a measure of the 
number of favorable responses. These in- 
creases, equivalent to about 9 and 8 percentile 
points, respectively, indicated that “individu- 
als have a slight tendency to show themselves 
to better advantage in the Industrial selection 
situation.” Correlations between administra- 
tions for the four traits ranged from .64 to 
.79, indicating that “subjects did not change 
their profile patterns substantially from one 
set of directions to the others.” 

These two studies show, at most, very mod- 
erate increases and fairly high correlations 
between scores in simulated employment and 
guidance situations on forced-choice tests. 
Longstaff and Jurgensen (3), however, in an- 
other simulated situation, found that indi- 
viduals oversold themselves substantially un- 
der a more suggestive set of instructions. 
This influence of instructions in simulated 
employment situations points to the need of 
performing studies of this type under more 
realistic conditions, where any tendencies to 
falsify would be self-induced rather than ex- 
perimentally encouraged. In this manner the 
practical utility of using forced-choice per- 
sonality tests for employment purposes can 
be more fairly judged. 

The present study was performed to deter- 
mine what differences, if any, occur in forced- 
choice personality-test performance under two 
conditions where some actual differential mo- 
tivation to falsify may be assumed to exist. 
The design was similar to that used by Long- 
staff and Jurgensen and by Rusmore, except 


258 


Fakability of a Forced-Choice Personality Test 


259 


Table 1 


Means and Standard Deviations Between Pretest and Retest Scores for the Guidance (W =88) 
and Employment (W=121) Groups 


Emotional 
Ascendency Responsibility Stability Sociability Total 

Dis ee Guid- Employ- Guid- Employ- Guid- Employ- Guid- Employ- Guid- Employ- 
‘ Statistic ance ment ance ment ance ment ance ment ance ment 
Pretest mean 2.0 2.9 3.7 4.3 4.7 5.3 3.6 4.8 14.0 17.2 
SD 5.2 5.9 5.7 48 6.4 5.6 6.5 6.1 154 14.9 
Retest mean 2.5 3.2 5.2 68 49 6.8 4.9 4.5 17.6 21.3 
SD 5.6 6.0 5.5 5.2 6.8 5.0 7.0 6.3 16.0 14,2 
Mean difference .5 3 1.5 2.6 2 1.6 1.3 -3 3.6 4.1 
t 1.5 8 RIENTS 6 4.4** 3.3** a 38h EST 


** Significant at the .01 level, 


that the Guidance and Employment situa- 
tions had a greater degree of realism. 

The Gordon Personal Profile (1), used in 
the study, is a brief four-factor personality 
test measuring Ascendency, Responsibility, 
Emotional Stability, and Sociability. It also 
yields a Total score which indicates the ex- 
tent to which the individual has selected com- 
plimentary rather than derogatory alterna- 
tives. 


Procedure 


Shortly after the beginning of the second semester, 
all junior and senior students in a small high school 
in Albuquerque 2 were administered the Gordon Per- 
sonal Profile after the following introduction: 


“As part of our guidance program, we are asking 
you to fill out a form called the Gordon Personal 
Profile. We will use the information we get in any 
future counseling that we may do with you. Please 
consider it an addition to our student personnel 
services which we are developing. for you.” 

Three months later, about two weeks before the 
close of school, students were informed that appli- 
cations for outside employment would be accepted 
through the municipal school system. This fol- 
lowed an established practice of the Youth Employ- 
ment Service to attempt to place continuing stu- 
dents in summer and part-time jobs and terminal 
students in full-time jobs. Students were asked, in 
their classrooms, whether they wished employment. 
Those who did were given a specially devised em- 
ployment blank on which to indicate the type of 
job desired, lowest salary acceptable, and other perti- 


2 The writers wish to express their appreciation to 
Dr. H. Lampman and Mr. G. Keppers of the Al- 
buquerque Public Schools, whose cooperation made 
this study possible. 


nent information. The Personal Profile was then 
readministered as an employment test. The follow- 
ing appeared printed on the employment blank: 

“If you desire employment at the end of the 
school year, and wish assistance in obtaining such 
employment, you are asked to fill in the information 
requested below. You will be asked to take the 
Gordon Personal Profile to provide a second copy 
to be appended to this employment form since the 
original copy is not available for this purpose. In- 
formation obtained from this form and from the 
Gordon Personal Profile will assist us in making 
more effective job placements.” 

Students who indicated that they did not wish 
employment were given a specially devised guidance 
blank to complete, primarily to occupy their time 
while the others were completing their employment 
blanks. They were asked about their attitudes to- 
ward particular school subjects, their educational 
plans, etc. This was followed by readministration 
of the Personal Profile for guidance purposes. The 
following was printed on the guidance blank: 

“You are asked to fill in the information requested 
below for vocational guidance purposes. You will 
be asked to take the Gordon Personal Profile to pro- 
vide a second copy to be appended to this Voca- 
tional Guidance form, since the original copy is on 
file elsewhere. If at some later time you wish to 
discuss your vocational problems, information ob- 
tained from this form and from the Gordon Per- 
sonal Profile will be of assistance in providing a 
better understanding of your goals and interests.” 

In all, 209 students, 157 juniors and 52 seniors, 
completed the test on both administrations. The 
Employment group contained 121 students, 65 boys 
of mean age 17.6 years and 56 girls of mean age 
17.5 years. The Guidance group contained 88 stu- 
dents, 34 boys of mean age 18.4 years and 54 girls 
of mean age 17.9 years. The first administration of 
the test was performed by the regular classroom 
teachers. The retest was performed by the Direc- 
tor of Guidance to provide a greater sense of realism. 


260 


Leonard V. Gordon 


Table 2 


Correlations Between Scores on the Pretest and Retest for the Guidance (W =88) 
and Employment (V=121) Groups 


Emotional 


Ascendency Responsibility Stability Sociability Total 
(guidance) (guidance) 80 84 87 86 84 
*(guidance) (employment) 80 68 74 78 19 


* Results 


Means and standard deviations for all 
traits measured, as well as Total score, and 
tests of significance of differences between 
means for the first and second administra- 
tions are presented in Table 1. It may be 
seen that both the Employment and Guid- 
ance groups increased their means signifi- 
cantly in the retest on Responsibility and 
Total score, In addition, the Employment 
group had a significant mean increase on 

%Since no significant sex differences in changes 


from the first to the second administration were 
noted, a single analysis is reported for both sexes, 


Emotional Stability and the Guidance group 
on Sociability. 

Correlations between the scores for the 
first and second administrations are presented 
in Table 2. These correlations, representing 
test-retest reliabilities for the Guidance group, 
with a three-month interval intervening, 
range from .80 to .87. The correlations be- 
tween the guidance and employment admin- 
istrations range from .68 to .80 for the 
Employment group. These correlations are 
significantly larger, at the 5% level, for the 
Guidance group on Responsibility and Emo- 
tional Stability. 

Using the Guidance group as a control 


Table 3 


Analysis of Variance, with Covariance Adjustments for Guidance Pretest, of Guidance and 
Employment Final Test Administrations 


Adjusted 
i Pretest Product Retest Adjusted Mean Mean 
Trait Source df SS F SS SS SS df Square F Diff. 
Ascendancy Between 1 39.10 1.23 27.30 19.06 65 1 65 05 alit 
Within 207 6585.14 5451.36 7105.83 2593.06 206 12.59 
Total 208 6624.24 5478.65 7124.89 2593.71 207 
Responsi- Between 1 19.71.73 52.17 138.15 68.76 1 68.76 5.49* 1.16 
bility Within 207 5594.22 4344.19 5951.64 2578.15 206 12.52 
Total 208 5613.93 4396.37 6089.79 2646.91 207 
Emotional Between 1 16.81 47 55.53 183.47 107.12 1 107.12 8.90% 1.45 
Stability Within 207 7480.15 5808.41 6990.45 2480.16 206 12,04 
Total 208 7496.96 5863.94 7173.92 2587.29 207 
Sociability Between 1 65.31 1.62 —21.57 7.12 91.53. 1 91.53 6.25" —1.41 
Within 207 8329.89 7148.71 9151.10 3016.08 206 14.64 
Total 208 8395.20 7127.14 9158.22 3107.61 207 
Total Between 1 545.00 2.36 630.21 728.74 65.42 1 65.42 86 1.14 
Within 207 47726.52 38558.95 46884.82 15732.49 206 76.37 
Total 208 48271.52 39189.15 47613.56 15797.91 207 


* Significant at the .05 level. 
** Significant at the .01 level. 


Fakability of a Forced-Choice Personality Test 


group, an analysis of covariance was run for 
the four traits and Total score to determine 
whether the increases in mean score for the 
Employment group was greater than might 
be expected from a guidance retest. 

Analysis of covariance data are presented 
in Table 3. An inspection of the first F 
column indicates that the two groups cannot 
be said to differ in initial mean score on any 
of the traits or the Total score, since none of 
these F tests are significant. 

When the retest means are adjusted for 
initial mean differences between the Employ- 
ment and Guidance groups, the Employment 
group is found to have a significantly greater 
mean on the retest than the Guidance group 
on Responsibility and Emotional Stability as 
indicated in the second F column in Table 3. 
The Guidance group has a significantly greater 
mean than the Employment group on Socia- 
bility. There is no difference in the retest 
performance of the two groups on Ascend- 
ency or Total score. 

The magnitude of the significant differences 
between the adjusted final means is 1.2 for 
Responsibility, 1.5 for Emotional Stability in 
favor of the Employment group, and 1.4 for 
Sociability in favor of the Guidance group. 


Discussion 


The significant increase in means on Re- 
sponsibility and Sociability for the Guidance 
group is somewhat surprising. The group is 
not atypical in that the pretest mean scores 
do not differ significantly from those of the 
Employment group on any of the traits and 
are similar to those reported for a national 
standardization sample of high-school stu- 
dents (6). Windle (5), in his review of test- 
retest effects on personality questionnaires, 
found mean increases in the direction of bet- 
ter adjustment to be fairly common on the 
retest. This occurred even when external 
variables had not been postulated as operat- 
ing to effect test-retest differences. Windle 
mentioned a number of possible intrinsic fac- 
tors that may have accounted for this phe- 
nomenon, but indicated that insufficient evi- 
dence is available to enable him to choose 
from among them. The writer must take the 
same position in being unable to account for 


261 


the increase made by the Guidance group in 
the present study. 

The Employment Group shows significant 
mean increases over the Guidance group 
equivalent to about 8 percentile points in Re- 
sponsibility and about 10 percentile points in 
Emotional Stability. The magnitude of the 
obtained correlations between the guidance 
and employment administrations indicates 
that individuals do not substantially change 
their relative positions on the traits from the 
Guidance to the Employment testing. In 
general these results, obtained in a realistic 
high-school guidance and employment setting 
are very similar to those reported by Rus- 
more, for the same test, using simulated con- 
ditions with college students. 

In evaluating the present findings, two lim- 
iting factors should be noted. First, since 
the students knew that their original scores 
were on file elsewhere at the time they were 
retested on the Personal Profile, they may 
have been inhibited from faking as much as 
they might have in an initial employment 
administration. Secondly, since the students 
were not candidates for a specific job, but 
rather were indicating desires for particular 
types of work, their motivation to falsify 
might have been reduced or less specific. 
Thus, while these results were obtained under 
one type of realistic employment conditions, 
a note of caution should be maintained re- 
garding their generality. The fakability of 
the present test, or forced-choice tests of its 
type, under actual industrial employment 
conditions remains to be determined. 


Summary 


1. The Gordon Personal Profile was ad- 
ministered to junior and senior students in a 
small high school for vocational guidance 
purposes. Three months later, at the end of 
the school year, the test was readministered 
to those students applying for jobs as an em- 
ployment test. Students not wishing jobs 
were given the test again as a guidance test. 

2. Using the Guidance retest group for con- 
trol purposes, significant increases of about 8 
percentile points on Responsibility and about 
10 percentile points on Emotional Stability 
were obtained by the Employment group. 


262 


The Guidance group obtained a statistically 
significant increase over the Employment 
group in Sociability, equivalent to about 9 
percentile points. 

3. For the Employment group, correlations 
between scores on the guidance and employ- 
ment administrations ranged from .68 to .80. 

4. Thus, individuals did not change their 
profile patterns substantially from a guidance 
situation to an employment situation and 
mean increases for the group were found to 
be moderate. Since the present study was 
performed in a high school situation, how- 
ever, the generality of these findings to actual 
industrial selection remains to be determined. 


Received July 14, 1955. 


Leonard V. Gordon 


References 


1. Gordon, L. V. Gordon Personal Profile. Yonk- 
ers, New York: World Book, 1953. 

2. Gordon, L. V. Manual, Gordon Personal Profile. 
Yonkers, New York: World Book, 1953. 

3. Longstaff, H. P., & Jurgensen, C. E. Fakability 
of the Jurgensen classification inventory. J. 
appl. Psychol., 1953, 37, 86-89. 

4. Rusmore, J. T. Fakability of the Gordon per- 
sonal profile. J. appl. Psychol, 1956, 40, 175- 
177. 

5. Windle, C. Test-retest effect on personality ques- 
tionnaires. Educ. psychol. Measmt, 1954, 14, 
617-633. 

6. World Book Company. Special report to com- 
munities that participated in the cooperative 
testing program. Yonkers, New York: World 
Book, 1953. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


A Technique for Increasing the Reproducibility of 
Cumulative Attitude Scales * 


Allen L. Edwards 
The University of Washington 


Various procedures have been described for 
improving the reproducibility of cumulative 
scales designed to measure attitudes and opin- 
ions (2, 3, 6,7). The present study reports 
upon the degree of reproducibility obtained 
when the method of paired comparisons is 
used in conjunction with a set of opinion 
statements with known scale values on a fa- 
vorable-unfavorable psychological continuum. 

Assume that NV statements with respect to 
some issue have been scaled by the method 
of equal-appearing or successive intervals so 
that a scale value representing the degree of 
favorability of each statement is known. A 
smaller set of m statements is now selected 
from the larger group of N statements in such 
a way that the scale separations of the state- 
ments are approximately equal. In equal- 
appearing interval scales, respondents are 
presented with the set of statements and 
asked to check whether they agree or disagree 
with each one. Scores on such scales are ob- 
tained by finding the median or mean of the 
scale values of the statements agreed with. 
It has been found, however, that attitude 
scales of the Thurstone equal-appearing in- 
terval variety, in general, have low coeffi- 
cients of reproducibility (2). 

Suppose, however, that each of the # state- 
ments is paired with every other statement, 
as in the method of paired comparisons. In 
each pair of statements, one statement will 
have a higher, or more favorable, scale value 
than the other. Let the statement with the 
higher scale value in each pair be designated 
as A and the statement with the lower scale 
value as B. These pairs of statements com- 
prise the items in the attitude scale to be 
evaluated. Respondents are asked to choose 
the statement, A or B, in each pair that best 

1 Thi t from the 
EE T ea Gale School, 
University of Washington, providing for the sta- 


tistical analyses which were carried out by Doris 
Dietze. 


indicates how they feel about the issue under 
consideration. Scores are obtained by count- 
ing the number of times the respondent has 
chosen the more favorable or A statement in 
the set of n(n — 1)/2 paired comparisons. 

It may be hypothesized that a respondent’s 
choice in each of the AB pairs will be a func- 
tion of his own position on an unfavorable- 
favorable attitude continuum corresponding 
to the one on which the statements have been 
scaled. He will choose, in other words, that 
statement in each pair that is closer to his 
own position. The respondent’s position is, 
of course, unknown, and is to be determined 
from the choices he makes when confronted 
with the AB pairs of statements. If a re- 
spondent falls exactly half way between the 
scale values of a given AB pair, his choice 
should be a matter of chance and all such 
choices will contribute to the unreliability of 
the scores obtained from the scale and also 
reduce the degree of reproducibility of the 
item responses from the scores. 


Description of the Scale 


A set of opinion statements relating to the. 
introductory course in general psychology 
had been scaled by the method of equal-ap- 
pearing intervals and two Thurstone-type at- 
titude scales of 20 statements each had been 
developed by a class in the techniques of atti- 
tude-scale construction at the University of 
Washington in 1948. Equal-appearing inter- 
val scale values for*the 40 statements were 
thus available. From the set of 40 state- 
ments, 9 statements were selected with scale 
values of 8.7, 7.8, 6.8, 5.8, 4.9, 4.1, 3.0, 2.0, 
and 1.0. High scale values correspond to the 
favorable end of the equal-appearing interval 
continuum and low scale values to the unfa- 
vorable end. 

Each of the 9 statements was paired with 
every other statement to give 9(9 — 1)/2= 
36 pairs of AB statements or items. The 


263 


264 


pairs of statements in the scale were ar- 
ranged so that for the odd-numbered pairs 
the first statement was the A, or more favor- 
able, statement. For the even-numbered pairs 
the second statement was the A, or more fa- 
vorable, statement. This arrangement was 
for scoring convenience and there is no evi- 
dence to indicate that the students subse- 
quently given the scale were aware of the 
ordering of the pairs of statements. 


Procedure and Results 


The scale was given to approximately 370 
students in the introductory psychology course 
at the University of Washington during the 
last two weeks of the spring quarter in 1953. 
The students were asked to choose the state- 
ment in each of the 36 pairs that best ex- 
pressed how they felt about the introductory 
course. They were not asked to sign their 
names to their papers in order to provide as- 
surance that their responses would have no 
influence on their grades in the course. 

Some students failed to respond to every 
item in the scale and their papers were dis- 
carded, leaving a total of 349 papers. These 
papers were divided into two groups of 175 
and 174 by taking alternate papers. All sta- 
tistical analyses to be reported were done 
with the first group of 175 papers and the re- 
sults then checked with the second group of 
174 papers, 

The 175 papers in the first group were 
scored by giving one point each time the stu- 
dent chose the more favorable or A statement 
in the 36 AB pairs. For each of the 36 pairs 
of statements, the proportion of favorable or 
A responses was then found by counting the 
number of students choosing the A statement 
and dividing by the total number of students. 
The items or pairs of statements were then 
arranged in rank order of the proportion of 
favorable responses and the predicted re- 
sponse patterns for each score were deter- 
mined in the manner described by Edwards 
(1). 

An error of prediction was counted each 
time an observed response to a given item 
failed to correspond to the predicted response 
for that item in terms of the score on all 
items. Predictions were made for a total of 


Allen L. Edwards 


(175) (36) = 6,300 responses, with 711 being 
in error. The proportion of errors was .113 
and the coefficient of reproducibility was 
equal to 1 — .113 or .887. The coefficient of 
reproducibility of .887 obtained with this set 
of 36 items compares favorably with the co- 
efficients of reproducibility customarily re- 
ported for attitude scales with many fewer 
items. 

For the same set of 36 items, the Kuder- 
Richardson (6), formula 20, estimate of reli- 
ability was obtained. This coefficient was 
.869 and it also compares favorably with the 
reliability coefficients, reported by Edwards 
and Kenney (4), for attitude scales con- 
structed by the method of equal-appearing 
intervals and the method of summated ratings. 

In order to check the results obtained and 
reported upon above, the second set of 174 
papers was scored. The response patterns 
predicted for each of the scores in this group 
of papers were based upon the proportions of 
favorable responses obtained in the first set 
of 175 papers. The errors of prediction were 
thus obtained independently of any consid- 
eration of the proportions of favorable re- 
sponses given by the members of the second 
group. For the second group the proportion 
of errors was .121 and the coefficient of re- 
producibility was 1—.121 or .879. The 
Kuder-Richardson estimate of reliability for 
the second set of papers was .883. 

It has been found previously that state- 
ments with scale values in the “neutral” or 
middle section of the favorable-unfavorable 
equal-appearing interval continuum tend to 
contribute to error and thus to lower repro- 
duciblity more than statements scaled to- 
ward the two extremes of the continuum (2). 
For this reason, it seemed worth while to 
check upon the value of the coefficient of re- 
producibility obtained when the two state- 
ments with scale values of 5.8 and 4.1 were 
eliminated from the set of 9 statements. Us- 
ing only the 7(7 — 1)/2 = 21 paired com- 
parisons, the two sets of 175 and 174 papers 
were rescored. Response patterns and errors 
of prediction for the first group of 175 pa- 
pers were obtained as before. The coefficient 
of reproducibility for the 21-item scale was, 
as expected, somewhat higher and equal to 


> 


Reproducibility of Cumulative Attitude Scales 265 


.914 for the first set of papers. The Kuder- 
Richardson estimate of reliability was .829. 

Using the proportions of favorable responses 
given in the first set of papers and the re- 
sponse patterns based upon these proportions, 
the errors of prediction for the second set of 
174 papers were obtained. For the second 
set of papers the coefficient of reproducibility 
was .904 and the Kuder-Richardson estimate 
of reliability was .861. 


Summary 


The results reported would seem to indicate 
that using the method of paired comparisons 
in conjunction with a set of opinion state- 
ments with known scale values on a favor- 
able-unfavorable continuum has promise for 
the construction of attitude scales with a rela- 
tively high degree of reproducibility and 
satisfactory reliability. 


Received November 10, 1955. 


. Loevinger, Jane. 


References 


. Edwards, A. L. On Guttman’s scale analysis. 


Educ. psychol. Measmt, 1948, 8, 313-318. 


. Edwards, A. L., & Kilpatrick, F. P. Scale analy- 


sis and the measurement of social attitudes. 
Psychometrika, 1948, 13, 99-114, 


. Edwards, A. L., & Kilpatrick, F, P. A technique 


for the construction of attitude scales. J. 
appl. Psychol., 1948, 32, 374-384. 


. Edwards, A. L., & Kenney, Katherine C. A com- 


parison of the Thurstone and Likert tech- 
niques of attitude scale construction. J. appl. 
Psychol., 1946, 30, 72-83. 


. Kuder, G. F., & Richardson, M. W. The theory 


of the estimation of test reliability. Psycho- 
metrika, 1937, 2, 151-160. 

The technic of homogeneous 
tests compared with some aspects of “scale 
analysis” and factor analysis. Psychol. Bull., 
1948, 45, 507-529. 


. Stouffer, S. A., Borgatta, E. F., Hays, D. G., & 


Henry, A. F. A technique for improving 
cumulative scales, Publ. Opin. Quart., 1952, 
16, 273-291. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956° 


The Relationship Between Item Ambiguity and 
Discriminating Power in a Forced-Choice Scale * 


Eleanore S. Isard 


Temple University 


Controlling bias or misrepresentation in 
self-rating scales is a perennial problem in 
the field of measurement. One of the newer 
techniques—the forced-choice—attempts to 
handle this problem through a procedure of 
item collection based upon statements made 
by representative elements of the population 
for whom the scale is being built, and through 
pairing of items on the basis of two computed 
indices: Preference and Discrimination. 

Item collection for a forced-choice scale 
was discussed by Gekoski and Isard in a re- 
cent note on a new use of the sentence com- 
pletion technique (2). The problem of pair- 
ing items on the basis of equal apparent 
favorableness or social acceptability (Prefer- 
ence Index) and significance for the criterion 
(Discrimination Index) has received some 
attention in the literature. These reports 
have dealt essentially with the discriminat- 
ing power of positive versus negative or so- 
cially acceptable versus socially unacceptable 
items (3, 4, 6,7). While this paper also con- 
cerns itself with the relationship between 
these two indices, the primary interest is 
with the discriminating power of so-called 
ambiguous items. 


Procedure 
Preference Indices 


Through the use of the essay method (8, 9), 
modified by critical incident (1), and the sentence 
completion method (2), a Student Questionnaire of 
284 statements of opinion toward school experience 
was constructed. Following several preliminary in- 
vestigations into the effects of size of sample and 
wording of instructions on item preference indices, 
the Student Questionnaire was administered to four 
samples of college freshmen and sophomores (total 
N=84). These students were instructed to rate 
each item on a five-point scale of social acceptability 


1 This paper is based in part on the writer’s Ph.D. 
dissertation at Temple University (5). The writer 
is indebted to Drs. Roy B. Hackman, Norman 
Gekoski, and Harold C. Reppert for their support 
and encouragement throughout this study. 


ranging from 1 (Highly Acceptable), through 3 
(Neutral), to 5 (Highly Unacceptable). 

A Preference Index was obtained for each item by 
computing the mean scale value from the responses 
of the total sample of 84 students. Prior to this, 
application of ż tests of the significance of the dif- 
ferences in mean preference indices obtained with 
each of the four subsamples indicated that the re- 
quirement of stability had been met. Items having 
a mean scale value (Preference Index) of 3.00 + .50 
were designated Neutral; those below these limits, 
Positive; and those above, Negative. The latter 
terms are used interchangeably with “Socially Ac- 
ceptable” and “Socially Unacceptable,” respectively. 


Discrimination Indices 


Additional samples were collected from the same 
college population for obtaining discrimination in- 
dices. ‘The instructions in this step were that the 
students respond to each statement in the 284-item 
questionnaire only in terms of whether they per- 
sonally agreed or disagreed with it. The validating 
criterion was one-semester grade-point averages com- 
puted from the official transcripts of the participat- 
ing students. Two criterion groups, each containing 
50 subjects equated on college aptitude test per- 
centile rank, were established: (a) Achievers—those 
with grade-point averages of 2.00 (“C”) or higher, 
and (b) Nonachievers—those with grade-point av- 
erages of less than 2.00. Phi coefficients based on 
the item-analysis data for the 100 subjects were com- 
puted as indices of item validity. A phi coefficient 
of .26 or higher was tentatively established as the 
criterion for inclusion of a discriminating item in 
the forced-choice inventory. It was found that this 
represents the very rigorous standard of discrimina- 
tion at the .01 level based upon the chi-square test 
of significance. 


Forced-Choice Scales 


By matching discriminating items with nondis- 
criminating items on the basis of equal Preference 
Index (plus or minus .10), it was possible to con- 
struct two forced-choice inventories, Form AA and 
Form ML. These differed in instructions and scor- 
ing procedure. Form AA consisted of 15 tetrads 
and had an approximately equal number of Positive, 
Neutral, and Negative items. The subjects were in- 
structed to select from each tetrad the two items 
with which they Most Agreed. Form ML was 4 
12-tetrad inventory consisting almost exclusively of 
so-called Neutral items. On this form, the students 


266 


Item Ambiguity and 


were instructed to select from each tetrad one item 
with which they Most Agreed and one with which 
they Least Agreed. An example of one tetrad from 
each of the forms appears below. For the reader’s 
convenience, preference and discrimination indices 


are presented beside each statement. The discrimi- 
nating items are asterisked. 
Prefer-  Discrimi- 
ence nation 
Index Index (¢,) Form AA 

1.89 .00 1. College standards should be 
at a level that will produce 
good students, 

1.96 —.47* 2. Reading and studying should 
be taught as subjects to col- 
lege freshmen. 

1.96 —=.31* 3. Textbooks should be written 
so that the average student 
can understand them with- 
out help. 

1.89 —.14 4. The ideal test requires an 
application of facts and 
knowledge to practical situ- 
ations. 

Form ML 

2.96 .00 1. If a student feels he has not 
been given a fair deal in a 
test, he should take the mat- 
ter up with the Dean, 

2.90 —44* 2. Most instructors prefer essay 
type tests so that they can 
have more leeway in mark- 
ing them. 

2.95 BS 3. Most students will try to get 
away with as little work as 
they can. 

2.86 19 4. Most textbooks are too long 
and dull. 


Both forms were administered to two groups of 
30 Achievers and 47 Nonachievers. Mean scores 
were computed for each group on each form. As 
a result of the findings, further study of Form ML 
was undertaken with a new sample of 100 college 
freshmen and sophomores. In addition to retesting, 
for reliability, a biasability study was performed 
with 39 highly motivated volunteer subjects. The 
instructions to bias were as follows: 

“Assume that the score you now obtain on this 
inventory will determine whether or not you are 
Permitted to remain in college. In selecting your 
answers, be guided by the assumption that the Uni- 
versity will keep only those students who have atti- 
tudes toward the administration, the instructors, the 
student body, etc., that are like those that good stu- 
dents have expressed. Therefore, select from each 
tetrad, as being the item with which you most agree 
and the item with which you least agree, those which 

_ will place you in the most favorable light with the 


Discriminating Power 267 


Table 1 


Preference Index Distribution of Discriminating and 
Nondiscriminating Items in the Student 
Questionnaire of Attitudes Toward 
School Experience 


Preference Category 


Posi- Neu- Nega- 

Items tive tral tive Total 
Discriminating 11 33 12 56 
Nondiscriminating 94 67 67 228 

Total 105 100 79 284 


University, i.e., those that you feel will agree with 
the key based upon the attitudes of good students.” 


Discussion and Results 


Table 1 shows the number of discriminat- 
ing and nondiscriminating items found in the 
Student Questionnaire for each of the three 
preference categories. 

In Table 1, x? equals 17.8264, which is sig- 
nificant at the .01 level. This indicates that 
there is a highly significant relationship be- 
tween type of item (Preference Index) and 
discriminating power. It should be pointed 
out that half of the contribution to y? comes 
from the Neutral category for discriminating 
items. In terms of percentages, of the 56 
discriminating items, 20% had been perceived 
as Socially Acceptable and 21% as Socially 
Unacceptable. The remaining 59% were, on 
face value, apparently Neutral. 

The next logical step appeared to be an 
examination of the graphic item counts for 
the Neutral category. This examination re- 
vealed that these so-called Neutral items had, 
in fact, not been rated “3” by the majority of 
subjects but, rather, had been assigned rat- 
ings ranging from Highly Acceptable (“1”) 
to Highly Unacceptable (“5”). It would ap- 
pear, then, that these statements, perceived 
by some as socially acceptable, by others as 
neither acceptable nor unacceptable, and by 
still others as socially unacceptable, might 

*more accurately be labeled Ambiguous. 
Therefore, it might be assumed that a pro- 
jective principle was operating.’ 

2 The» possibility of “unclear” personality items 
serving as “miniature projective tests” was suggested 
by Gordon (3). 


268 


The results of the preliminary investigation 
of Form AA indicated that it was not doing 
the job for which it had been designed. The 
mean score obtained by the achievers was 
the same, within a fraction of a point, as that 
obtained by the nonachievers. Form ML, 
on the other hand, appeared to show promise 
in the preliminary run, In the subsequent 
study, this form was found to have substan- 
tial validity (7piseriar Of .66 and .61 for test 
and retest, respectively) with equated sam- 
ples of 50 achievers and 50 nonachievers in 
the test situation and 46 achievers and 46 
nonachievers in the retest situation. Test- 
retest reliability was .76 + .04; SEmeas. Was 
3.17, with the possible total score range of 
— 24 to +24. An item analysis of Form 
ML revealed that, in general, items which 
discriminated in the questionnaire format 
either held up or became more valid in this 
forced-choice format. 

The results of the biasability study with 
Form ML did not warrant detailed statistical 
treatment, the mean difference between scores 
obtained under standard instructions and 
those obtained under instructions to bias 
being — 0.36. as! 


Summary 


The study reported here was based, in part, 
upon a larger one which concerned itself with 
the development of a forced-choice inventory 
of attitudes for predicting scholastic achieve- 
ment in college. The purpose of this paper 
was to report the findings on the relationship 
between Discrimination Indices and Prefer- 
ence Indices, with special emphasis on am- 
biguous items. The results may be summa- 
rized as follows: (a) In questionnaire format, 
Ambiguous statements were more valid than 
either Positive or Negative statements for 
differentiating college achievers from non- 
achievers. (b) In general, the validity of 
Ambiguous items either held up or increased 
in forced-choice format. (c) The 12-tetrad 
inventory consisting almost exclusively of, 
Ambiguous items was found to have substan- 


Eleanore S. Isard 


tiajereliability and validity for the purpose 
used, and did not appear to lend itself to 
willful misrepresentation on the part of the 
subjects. It was suggested that, in the use 
of Ambiguous statements of opinion toward 
school experience, a projective principle is 
called into operation. Furthermore, it is 
highly probable that the very ambiguity of 
the statements accounts for the failure of the 
subjects to intentionally bias (i.e., increase) 
their scores. 

In addition to more cross-validation stud- 
ies, structured interviews with the students 
who took part in the study may help to shed 
more light on the nature of the statements as 
well as on the reasons for the responses they 
evoke. Implications for counseling or atti- 
tudinal orientation are manifest. 


Received February 13, 1956. 
Early Publication. 


References 


1. Flanagan, J. C. The critical incident technique. 
Psychol. Bull., 1954, 51, 327-358. 

2. Gekoski, N., & Isard, Eleanore S. Note on an- 
other use of the sentence-completion tech- 
nique. J. appl. Psychol, 1955, 39, 139. 

3. Gordon, L. V. Some interrelationships among 
personality item characteristics. Educ. psy- 
chol, Measmt, 1953, 13, 264-272. 

4, Highland, R. W., & Berkshire, J. R. A meth- 
odological study of forced-choice performance 
ratings. USAF, Personnel Train. Res. Cent, 
Res. Bull., 1951, No. 51-9. 

5. Isard, Eleanore S. The development of a forced- 
choice inventory of attitudes toward school 
experience for predicting scholastic achieve- 
ment in college. Dissertation Abstr, 1955, 
15, No. 8. 

6. Lanman, R. W., & Remmers, H. H. The “prefer- 
ence” and “discrimination” indices in forced- 
choice scales. Educ. psychol. Measmt, 1954, 
14, 541-551. 

7. Parris, H. L. A comparative study of forced- 
choice and check-list ratings of Air Force 
R.O.T.C. instructors. Unpublished doctor's 
dissertation, Ohio State Univer., 1951. 

8. Rundquist, E. A. The forced-choice technique 
and rating scales, Paper read at Amer. Psy- 
chol.: Ass., Philadelphia, September, 1946. 

9. Sisson, E. D. Forced-choice—the new Army rat- 
ing. Personnel Psychol., 1948, 1, 365-381. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Using “Mark Sense” for Ratings and Personal 
Data Collection 


Bernard M. Bass 


Louisiana State University 


and Cecil R. Wurster 


Division of Research, Louisiana Department of Institutions 


The use of IBM “mark sense” cards has 
been adopted by the newly reorganized Divi- 
sion of Research of the Louisiana State De- 
partment of Institutions in its various report- 
ing and research projects. Mark-sensing is a 
procedure by which a specially printed IBM 
card (see Fig. 1) is marked with an electro- 
graphic pencil to indicate specific given in- 
formation. Through the use of the Mark 
Sensing Reproducer these marks are auto- 
matically converted into punched holes in the 
card corresponding to the relative positions 
of the pencil marks. The punched hole thus 
assumes the same numerical value as the 
mark which was originally made. This pro- 
cedure permits the collection of record or re- 
search data on IBM cards at the source of 
the data without any additional clerical or 
key punch work. Such a procedure elimi- 
nates all possible sources of clerical error in 
transferring data from original record to 
finally punched IBM card. 

All types of routine data on several hundred 
variables are now gathered on each of the 
100,000 yearly admissions to the state’s men- 
tal, tuberculosis and general hospitals, guid- 
ance centers, and correctional institutions by 
means of mark sense cards filled out by ap- 


propriate personnel at each institution. Dis- ` 


charge and follow-up cards provide a detailed 
Picture of each patient’s institutional history 
which can be collated immediately, with no 
clerical labor, with the information obtained 
when the patient was admitted. Twenty- 
seven mark-sense columns are available on 
each card. Double marking of columns per- 
mits as many as 36 alternative responses per 
column. An example of a tuberculosis pa- 
tient admission card is shown in Fig. 1 to 
illustrate the types of classification possible. 
This card has been marked and processed 


through the reproducer to illustrate the con- 
version of marks to punches, (All heavy bars 
indicate columns which are to be “double 
marked.”) 

Figure 2 is a correctional school exit card 
and illustrates the use of three-point rating 
scales in conjunction with personal data col- 
lection, 

More specific uses also are being made of 
mark sensing. Each week, as part of an ex- 
tensive, controlled study by the Department 
of Institutions Staff Committee on Mental 
Health Research to evaluate the effects of 
thorazine and reserpine therapies in the treat- 
ment of mental patients, every one of ap- 
proximately 400 patients at one hospital and 
300 at each of two other hospitals are being 
“mark sense” rated by from one to four phy- 
sicians, nurses, and attendants on 32 items of 
behavior. These “mark sense” SELH Rat- 
ing Scale Cards, developed by Frederick 
Hine, psychiatrist, and Joseph Dawson, clini- 
cal psychologist, are marked directly by the 
raters. Patient codes are prepunched into 
the cards for collation purposes. Little cleri- 
cal work precedes or follows the data collec- 
tion to obtain final research results. 

Statistical analyses can be prepared on 
IBM machines directly from the punched 
cards. For example, from the behavior rat- 
ing cards described above, an analysis of the 
agreement between the varying numbers of 
raters, using the Horst reliability formula 
(1), for each of the 32 items on each of six 
successive weeks was computed by an IBM 
604 Electronic Computer. (This computer 
performs all basic arithmetical computations 
—addition, subtraction, multiplication, and 
division.) The final 192 Horst reliability 
coefficients were obtained for a varying sam- 
ple of 100 to 150 patients about 12 work 


269 


270 


H 


Bernard M. Bass and Cecil R. Wurster 


ti; 


g Cad E 


ACUPEMMENNCNES 5 


EINCIACANE NEISTEN 
ECCE 


TUBERCULUSIS PATIENT ADMISSION CARO 


SEINE MCIMEI 


Fic. 


hours after the ratings were made at a cost 
of $153.00. (By hand calculator, the same 
analysis would have required around 600 
days.) While these computations were in 
progress, machine duplications of the ratings 
were ready for various other types of analy- 
sis by IBM equipment. 

“Mark sense” lends itself to self-rating, 
test response scoring, and sociometric data 
collection. The senior author collected the 
rank order solutions by 300 subjects to each 
of 12 problems. Self-ratings and buddy rat- 
ings were also collected. Desired analyses of 
the data which would have required 40 years 
steady work by hand calculator were com- 
pleted in approximately three working days 


a hom 
en 


1. IBM mark sense Tuberculosis Patient Admission Card showing electrographic pencil marks 
converted into punched holes. 


KEE ee 


following the data collection. The IBM 650, 
a computer with greater speed and storage 
capacity than the IBM 604, carried out the 
calculations. (The 604’s “instructions” are 
changed mainly by rewiring; the 650’s in- 
structions are changed mainly by a punched 
deck of cards. Once the deck is assembled, 
the 650 runs itself.) 

By making use of mark sense procedure 
where he now collects data on printed forms, 
the applied psychologist might find his re- 
search speeded up immediately with no loss 
in accuracy and at a considerable savings in 
clerical costs. Since each response can be 
labeled and defined as desired just below the 
space to be marked, the procedure obviates 


TRAINEE 
CODE 


NUMBER 


CORRECTIONAL SCHOOL EXIT CARD 


Fic. 2. 


IBM mark sense Correctional School Exit Card showing the use of rating scales in combina- 
tion with personal history data. 


ENOIINAIISNI J0 LNINLUVA3Q 31V1S VNVISINOT 


eS a 


<< 


Use of “Mark Sense” Cards 


the need for translating classifications into 
codes to be punched—one of the main sources 
of error in traditional “printed form-to-IBM 
key punch” operations. Translation is neces- 
sary only for classifications involving more 
than 12 alternatives which are mutually ex- 
clusive. 

Access to modern high-speed computers 
multiplies the value of “mark sensing.” We 
no longer need worry about the expense and 


271 


difficulty of coding and key punching large 
volumes of raw data, ordinarily required be- 
fore we can take advantage of the computers. 


Received October 13, 1955. 


Reference 


1. Horst, P. A generalized expression for the reli- 
ability of measures. Psychometrika, 1949, 
14, 21-31. 


The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Application of Temporal Correlation Techniques 
in Psychology 


W. Jay Merrill, Jr.* and Corwin A. Bennett 


International Business Machines Corporation, Endicott, New York 


For a number of years methods have ex- 
isted and been applied for correlating vari- 
ables displaced in time from each other. Re- 
cent studies suggest that such methods of 
correlating over time will be of increasing 
importance in psychology during the next 
few years. Time as a variable in psychology 
has, with few exceptions, been the domain of 
investigations usually designated under the 
broad classifications of “learning” and “fa- 
tigue.” In experimentation not specifically 
concerned with these topics, changes in be- 
havior over time have generally been “prob- 
lems”—factors to be eliminated by some 
(sometimes devious) means. Consider the 
following hypothetical example. 

A psychologist is investigating the effects 
of illumination on productivity of beginning 
punch-press operators. Plotting the perform- 
ance of one operator under one condition of 
illumination against days on the job, he finds 
the relationship shown in Fig. 1, Curve A. 
If the investigator is concerned with learning 
as it affects productivity, he may be disap- 
pointed at the irregularities in his data, and 
attempt to reduce these irregularities by av- 
eraging the productivity of several operators 
to obtain a “smooth curve.” If he is not 
concerned with learning in his investigation, 
he will probably average together the several 
days on the job in order to eliminate the 
problem of the time variable. These are com- 
mon procedures and not necessarily worthy 
of deprecation. Such procedures may, how- 
ever, serve to preclude the discovery of im- 
portant behavioral relationships. 

To return to the hypothetical psychologist, 
suppose that he attempts to fit a “learning 
curve” to his data. He might find that a 
curve of the form, P = kı(1 — e't) (where 
kı and ky are empirical constants), would 


1 The senior author wishes to express his gratitude 
to P. M. Fitts of Ohio State University for support 
during the early stages of the writing of this paper. 


suffice (Curve B, Fig. 1). This would usu- 
ally be the final analysis of such data. If, 
however, the deviations from the fitted curve 
were plotted against trials, Curve C, Fig. 2, 
would result. An imaginative investigator 
might look at this plot and think that there 
were periodicities or cycles of performance 
present. He might then fit a second curve 
of the form, P = kssin (kt), where kg and k4 
are constants as Curve D, Fig. 2. Depend- 
ing on many factors some psychological ex- 
planation for such a periodicity might be 
suggested. 

The original hypothetical data are a time 
series. The deviations from the first fitted 
curve are a stationary time series since any 
“over-all tendencies” or trends have been re- 
moved from the data by the first curve-fitting 
process. The deviations from the second 
fitted curve are often treated as “random 
fluctuations.” Temporal correlation tech- 
niques are means of finding relationships be- 
tween some sort of performance and time for 
a stationary time series.? Not only cyclical 
relationships as in the present case but non- 
cyclical temporal phenomena may be discov- 
ered by temporal correlation techniques. 


Definitions 


Temporal correlation techniques may be 
divided into two general classes: (a) discrete 
serial correlations—usually appropriate for psy- 
chological data; (b) continuous correlations— 
such as the auto-cross-correlation functions 
common at present in engineering applications. 
Each of these classes of temporal correlation 
may be further subdivided into two analogous 
classes: (i) autocorrelation—correlation of a 
variable with itself displaced in time; (ii) cross- 

*A time series may be stationary either because 
no trends were present to begin with or because 


trends have been statistically removed. For meth- 
ods of removing trends see Kendall (13, Ch. 29). 


272 


PRODUCTIVITY, HUNDREDS OF PIECES 


PRODUCTIVITY, HUNDREDS OF PIECES 


Application of Temporal Correlation Techniques 


DAYS ON THE JOB 
Fic. 1. Hypothetical productivity time series and learning curve. 


DAYS ON THE JOB 
Fic. 2. Deviations from hypothetical productivity learning curve and periodic function. 


273 


274 


correlation—correlation of one variable with a 
second variable displaced in time.* 

In terms of the familiar Pearson product- 
moment correlation formula, the serial auto- 
correlation for a particular time displacement, 
7, would be: 


s2 W 


Tae = 
ac Nok? 
and the serial crosscorrelation would be 


E tYitr 

ae (2) 
where x; and y; are the deviation values of the 
respective time series at predetermined points, 
i, along the time axis and W, indicates the 
number of these points. As 7 is increased, for 
a given time series, the number of available 
products decreases so that when i + ry indi- 
cates the last value in the series, xy, there is 
only one product, vwy, in the numerator with 
which to estimate the value of the autocorre- 
lation with displacement ty. Crosscorrelation 
works in the same way except that the displace- 
ment, 7, is added to the series of y values rather 
than to the x series, and the last product is 
xyyy.* In practice, of course, auto- and cross- 
correlations would not be computed for dis- 
placements so large that M was very small. 
It is apparent that these serial correlations (as 
any product-moment coefficients) will vary be- 
tween +1.00. Usual significance tests of prod- 
uct-moment r are appropriate.® 


®There has been considerable confusion of termi- 
nology in this area. Kendall (13, p. 402) uses “serial 
correlation” to mean what is here called “serial auto- 
correlation.” Other writers, such as Anderson (2), 
have used “autocorrelation” as equivalent to the 
present “serial autocorrelation” and “serial correla- 
tion” as equivalent to “serial crosscorrelation.” An- 
derson also quotes Yule as using “lagged serial cor- 
relation” as equivalent to “serial autocorrelation.” 
Some of the difficulty has arisen because these writers 
have been concerned only with temporal correlation 
of discrete data and thus have no need to distin- 
guish these from the correlation functions. 

#¥Formulas (1) and (2) imply the calculation of 
one estimate of os”, cz, and o, for all displacements, 
T. As N, becomes small it is probably desirable to 
estimate these parameters by using only those terms 
of the series which are used in the cross products. 
See Kendall's formula 30,7 (13, p. 402), 

5 Anderson (1, 2) and Hannam (11) discuss cer- 
tain special significance problems and tests connected 
with serial correlations. Hoel (12) lists a non- 
parametric test for “temporal relatedness” or “runs.” 


W. Jay Merrill, Jr. and Corwin A. Bennett 


Tf, in the case of discrete data, the standard 
deviation term in the formula for autocorrela- 
tion is dropped, a covariance form results.® 
This form is an approximate autocorrelation 
function for continuous data: 


1 Nr 
$zz(T) = N, E laur (3) 
7 i=l 
which is asymptotic to the autocorrelation 
function 


gez(T) = lim IS os +7) di, (4) 


where instead of summing W, deviation value 
products, the product of two functions over 
the time interval, T, is integrated. Thus, if a 
continuous time series, «(#), was made discrete 
by using only certain points, calling these 
values æ; the analogous discrete autocorrela- 
tion function would be obtained. 

Similarly, the discrete crosscorrelation func- 
tion may be defined as 


par 
dyz(t) ~ V. Da LMitry (5) 


T i=l 


and the continuous form as 


bor) = limk [ext + a. © 


Since Equations 4 and 6 require an averaging 
process in time, they are called time averages. 
The autocorrelation function is a continuous 
symmetrical function with a maximum at 
7=0." In the absence of periodicities, the 
function is asymptotic to the square of the 
mean of the “random function.” Therefore, if 
the mean is zero the autocorrelation function 
tends to zero as the displacement, 7, tends 
to infinity. Autocorrelation for a time series 
without periodicities (a random function) might 
look like Fig. 3. 


® The covariance form might actually be used in 
practice for a given time series since the standard 
deviation terms would be relatively constant and the 
appearance of the plotted correlation functions 
would not be affected. 

7 The restrictions implied for the autocorrelation 
function in communication engineering are that it be 
a damped even function with the maximum at the 
origin. Such properties need not be the case in 
other areas of application, although the function 
will approach evenness as T—> ©, 


Application of Temporal Correlation Techniques 


xxl) 


275 


BE = i iE 


DISPLACEMENT, T 
Fic. 3. Hypothetical autocorrelation function plot depicting a random function. 


Crosscorrelation is a measure of coherence 
between two functions. For independent ran- 
dom functions the crosscorrelation function is 
a constant which is the product of the indi- 
vidual mean values of the functions. Thus, if 
one mean were zero the crosscorrelation would 
be zero everywhere. This is called incoherence 
and is reminiscent of Pearson r, which has a 
zero value under analogous conditions. 

Without resorting to a mathematical demon- 


$xx(r) 


stration, it may be pointed out that the auto- 
correlation function, ¢22(7), of a periodic func- 
tion, x(t), is periodic itself, retains the funda- 
mental frequency and harmonics of x(#), but 
drops all phase angles. The crosscorrelation 
function, ¢2,(7), retains the fundamental fre- 
quency only if both x(/) and y(t) contain it, 
and retains only those harmonics which are 
present in both along with their phase differ- 
ences. 


RANDOM + PERIODIC COMPONENT 


RANDOM COMPONENT 


o 


DISPLACEMENT, T 


Fic. 4. Hypothetical autocorrelation fu 


nction plot depicting a periodic function plus randomness. 


276 


Figure 4 shows ¢:2(7) for a sine function 
plus randomness in contrast to Fig. 3, the auto- 
correlation function of a random function alone. 
A crosscorrelation graph of two sine functions 
would not show the random component, nor 
necessarily have the same period, but the gen- 
eral character of the curve would be preserved. 
Phase differences would tend to keep one of the 
maxima from lying on the ordinate. 

If periodicities are present in the time series, 
it is evident that large enough values of 7 need 
to be taken in autocorrelation so that the ran- 
dom influences approach zero as in Fig. 4. 
Thus, if x(#) is a mixture of periodic and ran- 
dom components, then 


a(t) = toli) + 2, (2). 


By application of the definition of ¢.2(r) Equa- 
tion 4, the autocorrelation of x(t) is 


Niel a taal 
$zs(7) = lim zf [xp(t) + x,-(é)] 
X [epli + 7) + x(t + 7) J di. 


When the two binomials are multiplied to- 
gether and the limits of integration applied to 
each term, the result is 


$e2(T) = Pzp2p(T) T b2,2,(T) 
an 92,2, (7) + $2,2,(7)- 


The first and last terms on the right are the 
autocorrelation functions for the periodic per- 
formance and the random function, respec- 
tively. The center terms are their crosscorre- 
lation function. As a matter of convenience 
let the means of x,(/) and x,(£) be zero. The 
function @,,2,(7) is nonperiodic and goes to 
zero as r approaches infinity. Due to inco- 
herence, $2,2,(7) and $2,2,(7) vanish, leaving 
z,2,(t) as the value of ¢zz(7), showing that 
the periodic component is responsible for the 
correlation as 7 approaches infinity. 

Similarly, the crosscorrelation can be shown 
to be 


dbeu(T) = pzpn l7). 


Crosscorrelation has the advantage of not being 
distorted around r = 0 because ¢z,y,(r) is in- 
coherent, and, with zero mean assumption, 
vanishes everywhere. 

There is no method available for testing sig- 


W. Jay Merrill, Jr. and Corwin A. Bennett 


nificance of the correlation functions. Usually 
this will present little difficulty for if the con- 
tinuous form is calculated over satisfactorily 
long T or if the discrete form is calculated for 
large N, significance may generally be assumed 
for moderate or large ¢(r) at the maxima, 

A method of analysis which is closely related 
to the autocorrelation function is the power 
density spectrum, %,,(w). The functions, 
Ọzz(7) and ©,,(w), are Fourier cosine trans- 
forms of each other. That is, 


$zz(7) = fh P(w) cos wr dw 


20 


and 


&,,(w) = $ ‘9 $(r) cos wr dr. 


Thus, when either (7) or ®(w) is known, the 
other may be found. In power density spec- 
trum analysis, power density is plotted against 
frequency rather than time displacement. 
This method is used frequently in studies of 
human tracking behavior.*® 


Computation 


For brevity and ease of presentation only 
autocorrelation computation will be discussed. 
The only difference in computing the cross- 
correlation lies in reading one value from the 
curve of one time series and the 7-displaced 
value from the other. 

Calculation of serial autocorrelation is 
straightforward Pearson correlation procedure. 
The original set of data will constitute the 
variable X. From this set a new variable X’ 
will be constructed such that the second score 
of the original set is now the first score of the 
new set, the third score is now the second and 
soon. By using any of the methods of com- 
putation for Pearson r on these two sets, X 
and X’, a serial autocorrelation with “lag one” 
is calculated, Serial autocorrelations of greater 
than lag one may be calculated in a similar 
manner by constructing new sets of scores, 
X”, X", ---, by displacing the scores corre- 

8 Another method used for studying temporal phe- 
nomena involves certain information measures (5, 
10, 18, 19, 24, 25). For other general discussions 


of autocorrelation functions and related techniques 
see References 17, 27, and 29. 


Application of Temporal Correlation Techniques 277 


spondingly. By plotting these serial auto- 
correlation values versus the lag, periodicities 
and the random function may be noted.° 

Autocorrelations for continuous data are not 
readily calculated by hand except by changing 
such data into discrete form and then treating 
as above. Machine methods of calculating 
autocorrelation functions do exist however. 

If an analog autocorrelator is available it 
will follow these steps: (a) the function, x(t), 
is displaced by a small interval, 7, resulting in 
æ(t + 7); (b) these two functions are continu- 
ously multiplied; (c) the product is integrated 
(continuously added); and (d) the average 
value of the integral is taken over the interval 
of integration. This entire procedure would 
then be repeated for other values of 7. 

If a digital computer is available, a second 
machine method of calculation of the auto- 
correlation function is utilized. This method 
is essentially one of changing the form of the 
data from continuous to discrete and calcu- 
lating a serial approximation to the autocorre- 
lation function. Analogous to the assumption 
that discrete samples are from a continuous 
distribution (a common assumption in statis- 
tics), the sequence of discrete points in the 


9 Other coefficients, where applicable, might be 
used in place of Pearson r; for instance, phi (8), 
and tetrochoric r. Indeed, Wertheimer (28) used 
tetrachoric coefficients in one of his studies. How- 
ever, such statistics have the same limitations in 
this application as in any other. Chapanis (5) has 
used chi square. 


TIME 
Fic. 5. Subdivision of a continuous time series to obtain discrete data. 


temporal series is likewise assumed to be an 
accurate representation of the continuous func- 
tions that make up the time series. The pro- 
cedure follows: (a) the function, x(t), is divided 
into sections of duration L (see Fig. 5 (these 
sections are chosen such that when periodicities 
are present, the junction points of the sections 
do not always occur at fixed locations relative 
to the periodicities) 9 (b) the a; values are 
determined by evaluating x(#) at the L junc- 
tions, the b; values at a constant time 7 after 
the a,s; (c) corresponding as and bs are multi- 
plied and summed; and (d) the sum is divided 
by the total number of products. Again, the 
procedure would be repeated for other values 
of 7. 

Steps c and d of this process may be ex- 
pressed by the formula 


isl 


N, 


Noting the correspondence of a; to X; and of 
bi to x4, this relationship is very much like 
the original expression for fac given by Equa- 
tion 1. The difference is that in Equation 1 
the variance is present in the denominator. 


Nr 
È abi 


10 More precisely, when choosing the number of 

’s for discrete sampling of the function, the num- 
ber of samples should exceed slightly the number of 
cycles of the highest frequency component, so the 
L < min p, where is the period of the highest fre- 
quency. In general, the size of L should be chosen 
so that x(t) doesn’t change appreciably during L. 


278 


This fact shows clearly that the autocorrelation 
function as defined does not vary between +1 
and —1. To accomplish this the next step 
would be to divide the covariance by the prod- 
uct of the standard deviations. In the case of 
autocorrelation with large W, the o’s would 
approach equality because the same function 
is responsible for the a; and b; values. In 
crosscorrelation there will usually be a differ- 
ence in the two values of ø and the two means. 
The result of this standardizing process in 
either case is the Pearsonian r. Here again 
the r must be computed for each different value 
of 7 desired, 

As the number of sections, L, gets larger the 
calculated value approaches the value of the 
autocorrelation function for the given r. For 
exact equivalence the number of sections would 
have to be infinite. 

Occasionally some values of the autocorrela- 
tion calculated for large 7 may exceed previous 
maxima, These are generally ignored since 
they result from the small N for large 7. The 
origin value may be exceeded for a 7 with large 
N when a combination of periodic components 
occurs in phase. 


Applications 


Temporal correlation techniques have been 
applied in several areas of psychology thus 
far. It is envisioned that many more will be 
found. 

In the area of psychophysics several in- 
vestigations (8, 26, 28) have been carried 
out to demonstrate that successive trials in 
psychophysical experiments cannot represent 
samplings from a population of independent 
responses. Typically, these studies have ob- 
tained “yes-no” responses in threshold-deter- 
mination situations for successive trials. In- 
tertrial interval has been varied from a few 
seconds to as long as a day. Serial autocor- 
relations or significance tests corresponding 
to these correlations have given results much 
like the right half of the curve shown in Fig. 
3. The interpretation is that (a) the effects 
of one trial show up on successive trials in 
such determinations, and (0) as the displace- 
ment between trials is increased, the strength 
of the effect decreases. 

The crosscorrelation function and the power 


W. Jay Merrill, Jr. and Corwin A. Bennett 


density spectrum have found extended use in 
studies of human tracking behavior (6, 7, 14, 
15, 16). Thus, the target is moved about in 
some fashion called the disturbance function. 
In (pursuit) tracking the follower is moved 
by the subject in response to the disturbance 
function. The subject’s response curve and 
the disturbance function are crosscorrelated. 
The crosscorrelation will generally be peri- 
odic and will have a maximum at some dis- 
placement such that the given response cor- 
responds to some earlier disturbance. This 
displacement is called the reaction time and 
generally equals about half a second. 

Philpott (20, 21, 22) has investigated out- 
put fluctuation in group performance of rela- 
tively simple mental tasks such as sub- 
stitution, easy arithmetic, dotting, etc. He 
discovered that output peaks occurred at 
predictable intervals and that the peaks 
tended to follow a pattern of generally in- 
creasing magnitude, which could be predicted 
by the simultaneous occurrences in phase of 
sine-like waves of different frequencies. Phil- 
pott’s investigations have, however, been 
criticized recently by Richardson (23) on 
statistical grounds. 

In a brief description of their work, Bar- 
low and Brazier (3) tell of studying “auto- 
correlation of spontaneous activity in the 
cortex.” They are also using the crosscorre- 
lation function “for the detection of responses 
in brain potentials evoked by experimental 
sensory stimuli.” 

There are many other areas of psychologi- 
cal research in which application of temporal 
correlation techniques might prove fruitful. 
Time and motion study might benefit from 
such analysis since definite “rhythms” or peri- 
odicities in routine tasks may be useful in the 
performance of such tasks. Studies of operant 
conditioning where periods of activity and 
inactivity need specification and explanation 
is another possibility. A study by Bixenstine 
(4) would seem to suggest that physiological 
measures such as palmar sweating have peri- 
odicities with a possible period of one week. 
Temporal correlation might be applied in a 
theoretical investigation of test-retest reli- 
ability where a curve of a so-called random 
function would presumably obtain. 


Application of Temporal Correlation Techniques 


In a recent literature review, Fiske and 
Rice (9) have discussed a wide variety of 
studies of what they term “intra-individual 
variability.” In this review they are con- 
cerned with predictable differences between 
“responses [which] show no systematic trend 
over time” [stationary time series]. These 
writers point out the extreme importance of 
such behavior and also point out that little 
systematic effort has been devoted to the 
area, although many isolated studies have 
been carried out. One of the most striking 
features about these studies is the wide va- 
riety of methods used to determine the na- 
ture of such time-varying behavior. Fiske 
and Rice indicate the unsatisfactory nature 
of some of these methods. While they rather 
pointedly ignore the autocorrelation methods 
in their discussions of methodology, it would 
seem that in many such instances these tech- 
niques would be ideally suited to the prob- 
lem at hand. 

Hopefully, investigators in other specific 
areas might see the usefulness of the tech- 
niques in their specialties. Certainly, it is 
only through the enrichment of psychological 
methods by addition of such specific tech- 
niques that psychology can aspire to predict 
behavior to its fullest extent. 


Summary 


Definitions and computation procedures for 
various temporal correlation techniques are 
presented. These techniques include serial 
correlations for discrete data and correlation 
functions for continuous data. Specifically 
described are autocorrelations for temporal 
relatedness within one series of data, and 
crosscorrelations for such relatedness between 
two series. These techniques are appropriate 
for discovery of both cyclical and noncyclical 
temporal phenomena. Various applications of 
temporal correlation techniques within psy- 
chology are described. 


Received February 23, 1956. 
Early Publication. 
References 


1. Anderson, R. L. Distribution of the serial cor- 
relation coefficient. Ann. math. Statist., 1942, 
13 et 


279 


2. Anderson, R. L. The problem of autocorrela- 
tion in regression analysis. J. Amer. statist. 
Ass., 1954, 49, 113-129. 

3. Barlow, J. S., & Brazier, M. A. B. Correlation 
studies of brain potentials. MIT Quart. 
Progr. Rep., April 1955, 79-82. 

4, Bixenstine, V. E. A case study of the use of 
palmar sweating as a measure of psychologi- 
cal tension. J. abnorm. soc. Psychol., 1955, 
50, 138-143. 

5. Chapanis, A. Random-number guessing behav- 
ior. Amer. Psychologist, 1953, 8, 332. 

6. Clark, J. R., Fontaine, A. B., & Warren, Cet OF 
The generation of a continuous random signal 
for use in human tracking studies. USAF, 
Hum. Resour. Res. Cent, Res. Bull, 1953, 
No. 53-40. 

7. Clark, J. R, & Warren, C. E. A photometric 
correlator. USAF, Hum. Resour. Res. Cent., 
Res. Bull., 1953, No. 53-42. 

8. Collier, G. Intertrial association at the visual 
threshold as a function of intertrial interval. 
J. exp. Psychol., 1954, 48, 330-334. 

9. Fiske, D. W., & Rice, Laura. Intra-individual 
response variability. Psychol. Bull, 1955, 52, 
217-250. 

10. Frick, F. C., & Miller, G. A. A statistical de- 
scription of operant conditioning. Amer. J. 
Psychol., 1951, 64, 20-36. 

11. Hannam, E. J. Exact tests for serial correlation. 
Biometrika, 1955, 42, 316-326. 

12. Hoel, P. G. Introduction to mathematical sta- 
tistics. New York: Wiley, 1947. 

13. Kendall, M. G. The advanced theory of sta- 
tistics, Vol. 2. London: Griffin, 1948. 

14. Krendel, E. S. A preliminary study of the 
power-spectrum approach to the analysis of 
perceptual-motor performance. USAF, WADC 
Tech. Rep., 1951, 6723. 

15. Krendel, E. S. The spectral density study of 
tracking performance: Part 1, The effect of 
instructions. USAF, WADC Tech. Rep., 1952, 
No. 11. 

16. Krendel, E. S. The spectral density study of 
tracking performance: Part 2, The effects of 
input amplitude and practice. USAF, WADC 
Tech. Rep., 1952, No. 11. 

17. Lee, Y. W., Cheatham, T. P., & Wiesner, J. B. 
Application of correlation analysis to the de- 
tection of periodic signals in noise. Proc. 
IRE, Oct. 1950. 

18. Newman, E. B. Computational methods useful 
jn analyzing series of binary data, Amer. J. 
Psychol., 1951, 64, 252-262. 

19. Newman, E. B. The pattern of vowels and con- 
sonants in various languages. Amer. J. Psy- 
chol., 1951, 64, 369-379. 

20. Philpott, S. J. F. Fluctuations in human out- 
put. Brit. J. Psychol. Monogr. Suppl, 1933, 


6, No. 17. 


1. Philpott, S..J. F. The curve of fluctuations in 
mental output and the curve of numbers of 
factors in the natural numbers. Brit. J. Psy- 

» chol, 1949, 39, 123-141. 

22. Philpott, S. J. F. Fluctuations in mental out- 
put. Quart. Bull. Brit. Psychol. Soc., 1950, 
1, 264-280. 

23, Richardson, L. F. Dr. S. J. F. Philpott’s wave 
theory. Brit. J. Psychol., 1952, 43, 468-475. 

24. Senders, Virginia. Further analysis of response 
sequences in the setting of a psychophysical 
experiment. Amer. J. Psychol., 1953, 66, 215- 
228. 

25. Senders, Virginia, & Sowards, A. Analysis of 

response sequences in the setting of a psycho- 


W. Jay Merrill, Jr. and Corwin A. Bennett 


physical experiment. Amer. J. Psychol., 1952, 
65, 358-374. 

26. Verplanck, W. S., Collier, G. S., & Cotton, J. W. 
Non-independence of successive responses in 
measurements of the visual threshold. J. exp. 
Psychol., 1952, 44, 273-282. 

27. Weiner, N. The extrapolation, interpolation and 
smoothing of stationary time series. New 
York: Wiley, 1949. 

28. Wertheimer, M. An investigation of the “ran- 
domness” of threshold measurements. J. exp. 

+ Psychol., 1953, 45, 294-303. 

29. Wise, J. The autocorrelation function and the 
spectral density function. Biometrika, 1955, 
42, 151-159. 


| 


- aii 


Journal of Applied Psychology 


= 
VoL. 40, No. 5 
—— 


s 


OCTOBER, 1936" 


Some Effects of Prolonged Experience in Communicatio 
Nets * 


Marvin E. Shaw and Gerard H. Rothschild 
The Johns Hopkins University 


Numerous experimental studies have dem- 
onstrated that the arrangement of communi- 
cation channels among the members of a” 
group has a significant effect upon group per- 
formance and satisfaction (1, 2, 3, 5, 6, 7, 9, 
11, 12, 13). Generally speaking, the com- 
munication net which permits more nearly 
equal participation by.the group members re- 
sults in higher member satisfaction and, when 
the task is to solve a relatively complex prob- 
lem, smaller time-and-error scores than does 
a communication net which restricts the par- 
ticipation of some group members more than 
others. 

These experiments have all used relatively 
short experimental periods, usually one ses- 
sion of about 50 minutes, although some have 
required one session of 2 to 24 hours (7, 13). 
It is possible that the observed effects of the 
communication net are temporary in nature, 
and that such differences would disappear 
(or perhaps reverse in direction) if groups 
were required to ‘Function on a day-to-day 
basis. The present experiment was designed 
to check on this possibility. 


Method 
Apparatus 


The apparatus used in this experiment was the 


same as that described in earlier reports (9, 11). It — 


consists of four cubicles which are connected with 
each other by slots through the walls separating 
them. The Ss communicate by writing messages on 
3 X 5 cards and passing them through slots. Various 


under Contract N5- 
the Office of Naval 
University. This 
Designation No. 


1 This experiment was done 
ori-166, Task Order 1, between 
Research and The Johns Hopkins 
is Report No. 166-I-202, Project 
NR 145-089, under that Contract. 


+ 


communication nets can be imposed by closing the 
appropriate slots. The three communication nets 
used in this experiment are shown in Fig. 1. 


Procedure ty 


B 

There were 20 problems requiring simple arith- 
metical computations similar to those described in 
earlier reports (9, 12). Eight items of information 
were needed to solve each problem. At the start of 
any test, each S was given two of these items. The 
order of presentation of problems to a given group 
was random, except that each order of presentation 
in one net was replicated in each of the other two 
nets, 

All Ss were male undergraduates at The Johns 
Hopkins University. They were paid for their serv- 
ices. Eight groups of four Ss each were randomly 
assigned to each of the three nets. The experimental 
design required that each group meet each day (ex- 
cluding Saturdays and Sundays) at approximately 
the same time for a total of ten days. Each group 
solved two problems each day. At the beginning of 
the experiment, Ss were told the general nature of 
the task, the method of communication, and who 
could communicate with whom. 

At the end of the last session, Ss were required to 
complete a questionnaire which asked for: (a) rat- 
ings, on an 11-point scale, of over-all satisfaction 
with their job in the group, (b) whether the group 
had a leader, and if so who occupied this position, 
and (c) whether the group developed a system, and 
if so what was the nature of the system. After this 


@3)8 T 
o 0@3}—_&9¢ se Nor 
SLASH COMCON 


The experimental nets. The numbers 


B 


Fic. 1. 
within circles are the Independence scores for each 


position, computed according to a formula given in 


a previous report (10). 


rch 


G al å he 


282 


COMCON 


SLASH 


TIME (MINUTES) 


E i E T 6G Tere) 1S) CaO 


DAYS 


Fic. 2. Mean time per problem as a function of 
practice in three communication nets. 


questionnaire had been collected by E, Ss were asked 
to indicate their satisfaction with the job on a day- 
to-day basis. The device for obtaining this infor- 
mation consisted of a grid with the 11-point rating 
scale along the ordinate and days along the abscissa. 
S rated his satisfaction with the group situation for 
each day (as he remembered it) by simply checking 
the appropriate space. This gave a graphic picture 
of the course of satisfaction over time in the vari- 
ous nets. 


Results 


The results of this experiment will be 
discussed under the following headings: (a) 
time, (0) message units, (c) errors, (d) rat- 
ings of satisfaction, (e) emergence of leader- 
ship, and (f) organization. In all of the 
analyses reported, there were large differ- 
ences among groups and individuals treated 
alike, and most of these differences were sta- 
tistically reliable. This finding agrees with 
the findings of previous investigations and 
appears to be of no great significance for the 
present study. 

Time. Time was measured from the “go” 
signal to the time the last person in the group 
had thrown his switch indicating that he knew 
‘the answer. The means of these scores are 
shown in Fig. 2. Analysis of variance? 
yielded significant Fs for nets (p < .05) and 
days (p< .001). Tukey’s (14) gap test? 
indicated that the comcon differed signifi- 
cantly from the star and the slash, but that 
these latter two did not differ significantly. 


2 This analysis was performed upon scores which 
had been transformed by the square-root transfor- 
mation to achieve homogeneity of variance. 

3 In all applications of the gap test the .05 level 
of confidence was accepted. 


Marvin E. Shaw and Gerard H. Rothschild 


This finding is at variance with a previous in- 
vestigation which found the slash faster than 
the star (9). The earlier study, however, re- 
quired Ss to solve only three problems during 
a single session; Fig. 2 shows that the slash 
was faster than the star during the first three 
days (six problems). Thus, the previous ex- 
perimental results are in agreement with these 
if we consider only the first few problems. 

A significant amount of learning occurred 
in all nets, in agreement with previous investi- 
gations. The gap test revealed that time 
scores decreased significantly from day to day 
during the first five days, but that decreases 
thereafter were not significant. 

The average time required by the Ss in the 
various positions within nets did not differ 
significantly, although differences were in the 
expected direction; i.e., Ss in positions hav- 
ing the higher Independence scores * required 
less time to reach a solution than did Ss in 
positions having lower Independence scores. 
Mean times and Independence scores corre- 
lated — .313, which, however, was not sta- 
tistically reliable. This finding was not un- 
expected since Independence scores had not 
been shown to be related to time scores in 
previous experiments. 

Message units. Contents of the messages 
transmitted by Ss in each group were ana- 
lyzed into units by defining a message unit 
as a simple sentence or any meaningful part 
of a complex or compound sentence. The re- 
sults of this analysis are shown in Fig. 3. 
As in the case of the time scores, the nets and 
days terms were significant (p < .001 in each 
case). These results are not in complete 
agreement with the results of short-term ex- 
periments. The star required fewer mes- 
sages than either the slash or the comcon as 
had been expected, but contrary to expecta- 
tions, the comcon required fewer messages 
than did the slash. This discrepancy fits in 
with that found in connection with the time 
scores; a possible reason for these findings is 
discussed later. 

Number of messages decreased with time in 

4The Independence score is a measure of the de- 
gree of freedom of action permitted the individual 
as a result of his position in the communication net. 


A formula for computing this score has been given 
in a previous report (10). 


ins 


Prolonged Experience in Communication Nets 


each of the three nets, presumably because 
Ss learned to communicate more efficiently. 
The gap test showed that message units de- 
creased significantly only for the first four 
days. 

Differences between message units trans- 
mitted by Ss in various positions within nets 
agreed with expectations based upon previous 
findings. In the star, Position A (see Fig. 1) 
transmitted significantly ( < .01) more mes- 
sage units per problem (mean = 12.1) than 
did Positions B, C, and D (mean = 3.8). 
In the slash, Positions A and C transmitted 
significantly (p < .01) more message units 
per problem (mean = 14.8) than did Posi- 
tions B and D (mean =9.7). In the comcon, 
no significant differences were found. Mean 
message units per problem and Independence 
scores correlated .251. Again, this was in the 
expected direction but was not significant. 

Errors. There were so few errors com- 
mitted by any of the groups that it was not 
possible to evaluate them statistically. The 
observed differences were consistent with 
previous findings in that the star produced 
more errors (mean per group = 2.0) than 
did either the slash (mean per group = 1.5) 
or the comcon (mean per group = 1.6). 

Ratings of satisfaction. The mean ratings 
of over-all satisfaction were not significantly 
different for the three nets, although differ- 
ences were in the expected direction (means 
were 7.56, 7.94, and 8.69 for the star, the 
slash, and the comcon, respectively). Dif- 


160 
140 


120 
100 


© COMCON 
6 SLASH 
o STAR 


80 
60 


MESSAGE UNITS 


5 6 7 8 9 10 


3 4 


DAYS 
Fic. 3, Mean number of message units transmitted 
per problem as a function of practice in the com- 
munication nets, 


283 


z 
2 
E 
z 
ira 
a 
q 
ao 
o 
oO 
z 
= COMCON 
q 
« SLASH 
z STAR 
<q 
wW 
Zo 
a A a MSA. A S 7)" Tee SO) 
DAYS 
Fic. 4. Mean ratings of satisfaction by Ss within 


nets as a function of experience in the three nets. 


ferences between positions within nets were 
significant only in the case of the star; 
Position A rated over-all satisfaction signifi- 
cantly (p < .05) higher (mean = 9.63) than 
did Positions B, C, and D (mean = 6.96). 
Mean ratings of over-all satisfaction and In- 
dependence scores correlated .953; this was 
as expected and is statistically significant 
(p < 01). 

The results of the day-to-day ratings of 
satisfaction are shown in Fig. 4. Analysis of 
variance yielded significant Fs for nets (p 
< .05) and for days (p < .001). In agree- 
ment with expectations from previous experi- 
ments, the Ss in the comcon rated satisfac- 
tion higher than did Ss in the slash who in 
turn rated satisfaction higher than did Ss in 
the star. 

Differences in the course of satisfaction 
over time are especially interesting. The gap 
test revealed that the ratings increased sig- 
nificantly up to the third day and thereafter 
showed no significant differences between 
days. This certainly would not have been 
predicted from experimental evidence avail- 
able heretofore. Leavitt appears to be the 
only previous investigator who attempted to 
measure satisfaction as a function of experi- 
ence. He reported “trends of increasing 
satisfaction in the circle and decreasing satis- 
faction in the wheel” (6, p. 44).* It should 
be remembered, however, that he asked for 
indications of satisfaction during a single ses- 

5 Leavitt’s wheel is called “the star” in the present 
report. 


284 


sion rather than between sessions as in the 
present experiment. Also, his groups were 
required to solve a different type of problem, 
a variable which has been shown to affect 
ratings of satisfaction (11). 

Emergence of leadership. A leader is said 
to have emerged in a group if three or more 
of the four Ss named the same person in re- 
sponse to the question, “Did your group have 
a leader? If so, who?” According to this 
criterion, a leader emerged in only two groups 
in the comcon and in the slash, whereas a 
leader emerged in all groups in the star. The 
difference between the star and the other two 
nets is statistically reliable (p < .05) and 
agrees with previous findings. 

Although the comcon and the slash did not 
differ in this respect, the reasons why a leader 
did not emerge are probably different for the 
two nets. Spontaneous explanatory com- 
ments written in on the questionnaire by Ss, 
as well as spontaneous comments to E, indi- 
cated that a leader did not emerge in the 
slash primarily because of the conflict be- 
tween the two persons occupying Positions A 
and C, whereas in the comcon it was due to 
a feeling of equality among the group mem- 
bers. This interpretation is further bolstered 
by the fact that in four of the eight groups 
in the slash at least one person (eight Ss al- 
together) named two persons as the leader— 
those in Positions A and C. Only one person 
in the comcon named more than one person 
as the leader, and none in the star did so. 
This finding fits in rather well with the find- 
ings in regard to organizational development 
to be presented in the next section. 

We were also interested in the frequency 
with which Ss in the various positions would 
be named the leader as a function of the In- 
dependence score of that position. This 
“recognition of leadership” was found to be 
highly correlated with Independence scores 
(r= .968, p < .01), in agreement with ex- 
pectation and with previous research. 

Organization. The pattern of organization 
which developed in the various groups was 
determined by analysis of the contents and 
distribution of the messages transmitted dur- 
ing the solution of problems and by the re- 
sponses of Ss to the question, “Did your 


Marvin E. Shaw and Gerard H. Rothschild 


Table 1 


Frequency of Occurrence of Organizational Patterns 
in Each of the Three Nets 


Organizational Patterns 


Each- 
to-All Central 
Each- plus plus No 
Nets to-All Check Central Check Pattern 
Star 0 0 5 3 0 
Slash 2 0 1 0 5 
Comcon 2 3 3 0 0 


group develop a system? If so, describe it 
briefly.” Altogether, four types of organiza- 
tional patterns were distinguished: (a) Each- 
to-all, in which all information was trans- 
mitted to all Ss and then each S$ solved the 
problem independently; (b) each-to-all plus 
check, which was the same as (a) except that 
answers were passed to other Ss for checking 
before being accepted; (c) central, in which 
all information was sent to one person who 
solved the problem and sent the answer to 
other Ss who merely accepted it, and (d) 
central plus check, which was the same as 
(c) except that the answer was checked by 
at least one other S in the group. A fifth 
category, labeled “No pattern,” includes all 
of those cases where no recognizable pattern 
emerged; e.g., groups in which all Ss said 
they did not have a system and in which 
none could be discerned in the pattern of 
message transmission. The results of this 
analysis are given in Table 1. 

Nets differed significantly (p < .01) in fre- 
quency with which these patterns emerged. 
The comcon groups showed predominantly 
“each-to-all” organization, whereas all star 
groups developed the “central” type, with 
the checking procedure being initiated by 
about half of the groups in each net. Most 
of the slash groups fell into the “no pat- 
tern” classification. Characteristically, mes- 
sages were sent at random until someone 
happened to get all of the information, O! 
each S sent all of his original information to 
all other Ss with whom he was connected 
directly without relaying to less fortunate 
group members when necessary. In the lat- 
ter case, one or the other of the two pe 


Prolonged Experience in Communication Nets 


ripheral persons found that he did not have 
enough information to solve the problem. 
This fits in with the conflict between the two 
potential leaders mentioned above. Also, the 
fact that the slash was slower and sent more 
message units than had been expected follows 
from this lack of organization. 


Discussion 


In discussing the results presented above, 
we shall be interested in two main questions: 
(a) changes in group behavior in the various 
nets as a function of experience, and (b) the 
extent to which the present results agree or 
disagree with the results of previous, short- 
term experiments. 

Groups solved problems faster, sent fewer 
messages, and became better satisfied with 
the task as a function of experience in the 
experimental situation. These findings hold 
for all nets and apparently were due to Ss 
learning how to operate in the previously un- 
familiar situation. That is to say, Ss learned 
to send only relevant information and, ex- 
cept in the case of the slash, to use some sys- 
tem for routing information. Performance 
thus became more in line with expectations 
and satisfaction increased accordingly. This 
interpretation assumes that college students 
expect to do very well on simple tasks of the 
sort used in this experiment, and that satis- 
faction is a function of the degree of discrep- 
ancy between expectation and actual accom- 
plishment. This latter motion, of course, is 
essentially that suggested by Freud (4, p. 16) 
and more recently by McClelland et al. (8). 

Looking at Figs. 2 and 4 one might sus- 
pect that Ss were rating satisfaction on the 
basis of length of time in the experimental 
situation. This hypothesis was rejected be- 
cause individual time scores and ratings 
failed to correlate significantly (r = — 047). 
Likewise, total time per session and average 
tating per group correlated only — .003. 

We turn now to the second question. Previ- 
ous experimental results indicated that the 
comcon should be faster, transmit more mes- 
sage units, and be more satisfying than the 
slash, which in turn should be faster, trans- 
mit more message units, and be more satis- 


285 


fying than the star. Actually, these rela- 
tionships were found only for the ratings of 
satisfaction. The comcon was faster than 
either the star or the slash, whereas these 
latter two did not differ significantly in this 
respect. Likewise, the star transmitted fewer 
message units than did either the comcon or 
the slash, but the slash sent more messages 
than did the comcon. In other words, the 
slash required both more time and messages 
than had been expected. 

The reasons for these discrepancies are 
probably very complex, but it seems to us 
that they are due largely to the failure of the 
slash groups to develop any effective organi- 
zational pattern. As we have indicated 
previously, there were reasonably clear evi- 
dences of conflict between the two logically 
possible leaders in the slash groups. This 
conflict apparently prevented effective or- 
ganization which in turn resulted in erratic 
message transmission and slower problem 
solution. (Also, the fact that either Position 
A or Position C could be the mediator be- 
tween Positions B and D probably led to 
some confusion as to who would perform this 
function; consequently, Positions B and D 
were sometimes left without enough informa- 
tion to solve the problem themselves and no 
one sent them the answer.) The rigid struc- 
ture of the star suggested only one effective 
organizational pattern—the central pattern, 
and the complete lack of structure in the 
comcon indicated a need for organization of 
some type (or perhaps a lack of need for or- 
ganization since even if Ss merely sent out 
all of their original information over all 
available channels, the each-to-all pattern 
would result and the task would be effec- 
tively completed), whereas the structure of 
the slash suggested at least two possible or- 
ganizational patterns with no evident means 
of discriminating between them, 

Differences among positions within nets 
were in all cases in the direction expected 
from the results of previous research, and in 
most cases the differences were statistically 
reliable. The relationship between Independ- 
ence and behavioral measures agreed with 
previous results for ratings of satisfaction, 
recognition of leadership, and time scores, 


286 


but did not agree for message units trans- 
mitted. 

Previous explanations in terms of freedom 
of action (6, 9) and saturation (5, 12) ap- 
pear adequate to account for these results. 


Summary 


This experiment studied the effects of cer- 
tain communication nets upon group behav- 
ior when groups were required to operate in 
the same net over a period of several days. 
The communication nets were the comcon, 
the slash, and the star. Eight groups of four 
Ss each were assigned to each of the three 
nets, Each group solved two simple arith- 
metic-type problems each day for a period of 
ten days. Sessions were scheduled at approxi- 
mately the same hour each day, and were 
scheduled on successive days except that 
Saturdays and Sundays were excluded. At 
the end of the experiment, Ss filled out ques- 
tionnaires which asked (a) for ratings of 
satisfaction with job in the group both on an 
over-all basis and on a day-to-day basis, (b) 
whether the group had a leader, and (c) 
whether the group developed a system. 

The results were as follows: (a) All groups 
solved problems faster, sent fewer messages, 
and rated satisfaction higher as a function of 
sessions in the net. (b) As expected, the 
comcon groups rated satisfaction higher than 
the slash groups, who rated satisfaction higher 
than did the star groups. (c) The comcon 
was faster but sent more messages than did 
the star, again as expected, but the slash was 
slower and sent more messages than did 
either the comcon or the star (although dif- 
ferences between the star and the slash with 
respect to time scores were not statistically 
reliable). (d) A leader emerged more fre- 
quently in the star than in either the comcon 
or the slash. (e) The comcon groups de- 
veloped predominantly “each-to-all” organi- 
zation, and the star developed predominantly 


Marvin E. Shaw and Gerard H. Rothschild 


“central” organization, but the slash appeared 
to be almost completely disorganized. 


Received December 27, 1955. 


References 


1. Bavelas, A. Communication patterns in task- 
oriented groups. J. acoust. Soc. Amer., 1950, 
22, 725-730. 

2. Christie, L. S., Luce, R. D, & Macy, J., Jr. 
Communication and learning in task-oriented 
groups. Res. Lab. of Electronics, M. I. T. 
Tech. Rep., 1952 (Rep. No. 231). 

3. Flament, C. Réseaux de communications spon- 
tanés et imposés. Paris: Commission de Psy- 
chologie sociale Année Scolaire, 1954-55. 

4. Freud, S. An outline of psychoanalysis. (Trans, 

by J. Strachey.) New York: Norton, 1940. 

. Gilchrist, J. C., Shaw, M. E., & Walker, L. C. 

Some effects of unequal distribution of infor- 
mation in a wheel group structure. J. ab- 
norm. soc. Psychol., 1954, 49, 554-556. 

6. Leavitt, H. J. Some effects of certain communi- 
cation patterns on group performance. J. ab- 
norm. soc. Psychol., 1951, 46, 38-50. 

7. Luce, R. D., Macy, J., Jr., Christie, L. S, & 
Hay, H. D. Information flow in task-ori- 
ented groups. Res. Lab. of Electronics, M. I. 
T. Tech. Rep., 1953 (Rep. No. 264). 

8. McClelland, D. C., Atkinson, J. W., Clark, R. A., 
& Lowell, E. L, The achievement motive. 
New York: Appleton-Century-Crofts, 1953. 

9. Shaw, M. E. Some effects of unequal distribu- 
tion of information upon group performance 
in various communication nets. J. abnorm. 
soc. Psychol., 1954, 49, 547-553, 

10. Shaw, M. E. Group structure and the behavior 
of individuals in small groups. J. Psychol., 
1954, 38, 139-149. 

11. Shaw, M. E. Some effects of problem complex- 
ity upon problem solution efficiency in differ- 
ent communication nets. J. exp. Psychol, 
1954, 48, 211-217. 

12. Shaw, M. E. A comparison of two types of 
leadership in various communication nets. J. 
abnorm. soc. Psychol., 1955, 50, 127-134. 

13. Shelley, M. W. The effects of problem satura- 
tion in various communication networks. Un- 
published master’s thesis, Univer. of Wis- 
consin, 1953. 

14. Tukey, J. W. Comparing individual means in 
the analysis of variance. Biometrics, 1949, 5; 
99-114. 


wn 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


A Null-Point Discontinuous Electrical Pursuit Meter 


R. J. Shephard 
RAF Institute of Aviation Medicine, Farnborough, Hants, England 


Electrical pursuit meters were used fairly 
extensively during the recent war (5, 6, 2, 
8). Theoretically this type of apparatus has 
a number of attractions—the tasks can be 
varied, interesting, and fairly closely related 
to the problem of flying an airplane, frequent 
readings can be obtained and scoring can 
be highly objective. In practice, it did not 
prove a success (7). Several days of train- 
ing were sometimes necessary to attain a con- 
stant performance, and difficulties were en- 
countered with circuit variables under the 
rigorous conditions of heat, humidity, and re- 
duced pressure demanded by aviation physi- 
ology. 

The apparatus now to be described is an 
electrical pursuit meter of discontinuous type. 
The design is based on a Wheatstone bridge 
network, as suggested by Vere. It is simple 
in construction, sturdy in structure, and easy 
to operate. Further, since it embodies a null- 
balance principle, difficulties due to circuit 
variables are largely eliminated. It is ca- 
pable of examining a number of parameters 
of psychomotor performance simultaneously, 
and can be readily adapted to meet a variety 
of test situations. These favorable features 
suggest there may yet be a place in psy- 
chomotor research for electrical pursuit me- 
ters of this type; accordingly a careful evalu- 
ation of the present machine has been made 
under normal resting conditions and during 
the stress situation of high pressure breathing. 


Methods 


The Apparatus. The subject (S) is confronted. 
by a plain panel set at an angle of about 15° to 
the vertical (Fig. 1). Mounted in the upper part 
of the panel are two 0-20 voltmeters, Vı and Vs, 
separated by a distance of one foot. (This separa- 
tion prevents accurate simultaneous reading of the 
two dials.) The operator can produce a variety of 
readings of Vi, and S is required to produce corre- 
sponding readings of Va by operation of a small, 
centrally placed control knob. 


1 Personal communication, 1955. 


The circuit is based on the familiar Wheatstone 
bridge (Fig. 2).? One arm of the bridge carries a 
uniselector allowing connection to nine different re- 
sistors. The voltage across this is recorded by V3. 
A second arm of the bridge carries a rheostat, Re. 
This is operated by S, the voltage across it being re- 
corded by a second (matched) voltmeter, Vz. The 
full movement of the rheostat is 270°, and with the 
small control knob several wrist movements are 
needed to cover this range. At one end of the scale 
a large voltage change is produced with a small 
amount of rotation, while at the other end of the 
scale a comparatively small change is produced by 
a large movement of the control knob. This allows 
the introduction of tasks of graded difficulty. 

The remaining two arms of the bridge consist 
simply of matched resistors. Thus the current flow- 
ing through the galvanometer, G, is proportional to 
the difference in voltmeter readings Vi— Vs. When 
S has completed his task (that is to say, Vi = Vs) 
no current flows through the galvanometer; the final 
end point is therefore independent of variations in 
resistor values. It is important to obtain a galva- 
nometer of high sensitivity in order to make the two 
halves of the bridge virtually independent; with the 
present system, no detectable change in the reading 
of Vı is noted with adjustment of Vs. 


Design of the Experiment 


Test procedure. The nine resistors attached to the 
uniselector give a possible 80 tasks, but for the pres- 
ent purpose 16 were considered sufficient. These 
were presented at the rate of one every four seconds, 
They varied in severity according to the size of the 
initial stimulus (movement of V:), the rotation re- 
quired, and the sensitivity of the rheostat in the re- 
gion of final adjustment. 

Subjects. The Ss were drawn from the medical 
laboratory staff. Six adults (5 male, 1 female) took 
part in the experiments under normal resting con- 
ditions, They were seated with the panel dials at 
approximately eye level, and the control knob at a 
comfortable distance from the body chosen by the 
subject. Before the first test they were each shown 
a typical tracing, and it was explained that the ap- 
paratus would measure both the speed and the ac- 
curacy with which they performed the various tasks. 
Each S attended at six different times on consecu- 
tive days, and on each occasion operated the ma- 
chine for a little over 5 minutes. 

Nine male Ss took part in the pressure-breathing 
experiments, five attending on more than one oc- 


2 Details of the circuit and apparatus may be ob- 
tained from the author on request. 


287 


288 


a 


aS an 


R. J. Shephard 


VOLTMETER Va 


Fıc. 1. External appearance of pursuit meter, 


casion. The routine for each visit consisted of a 
3-min. control run, 2-min. run during venous con- 
gestion (cuff inflated to 78 mm.Hg. around right 
arm), 2-min. run wearing pressure-breathing helmet, 
4-min. pressure breathing (sometimes shortened ow- 
ing to subjective distress), and final 2-min. control 
run, 


Results 


Factors Affecting Performance Under Normal 
Resting Conditions 


Analysis of the tracing. Three parameters 
have been measured for each individual task. 
These are: 


1. Time taken by S to initiate movement 
of the rheostat (“initial response time”). 

2. Total time elapsing before completion of 
the task (“total response time”). 

3. Final error of matching accepted by the 
subject. 


For each $ 480 values have been obtained 
for each parameter. Performance has been 
analyzed with particular reference to the ef- 
fects of experience, time of day, and intertask 
variation. 

Time of day. Summing performance over 
individual days, a two-way classification of 
Ss against time of testing can be made. A 
simple analysis of variance for data classified 
in this way (6) shows there is a highly sig- 
nificant component (P < .001) attributable 
to intersubject differences, but there is no sig- 
nificant variance attributable to the time of 
day when the observations were made. 


Experience. Since time of testing is not 
important, it is permissible to rearrange the 
data to yield a two-way classification of Ss 
against day of testing. Analyzing the vari- 
ance as before, there is no significant com- 
ponent attributable to “days” except in the 
case of “error.” Variations of error occurred 
mainly on two days, Day 4 showing a lower 
accuracy and Day 5 a higher accuracy than 
the other four days. On any one day, there 
were only slight fluctuations of performance 
from minute to minute. The absence of any 
systematic learning effect is rather surprising, 
and may be due to the fact that the Ss were 
research workers experienced in the match- 
ing of dials on electronic apparatus. 

Intertask variation. A two-way classifica- 
tion of Ss against tasks can be made by sum- 
ming the data over days. An analysis of 
variance for data classified in this way is pre- 
sented in Table 1, and it can be seen that for 
each of the three parameters examined there 
is a significant component of the total vari- 
ance attributable to intertask differences. 

Further analysis of intertask differences is 
made possible by calculating the over-all mean 
value for each task, and arranging these 
values in order of magnitude (Table 2). The 
tasks are all of similar nature, and it is there- 
fore reasonable to assume that the error vari- 
ance of Table 1 is normally distributed be- 
tween the different tasks. Calculating critical 
difference levels on this assumption, it is pos- 


Null-Point Discontinuous Electrical Pursuit Meter 


Table 1 


Variance of Test Scores Under Resting Conditions. 
Two-Way Classification of Data 


(Ss against performance with individual tasks) 


Mean Variance 


289 


sible to assess the significance of individual 
intertask differences at different levels of 
probability. The three parameters of psy- 
chomotor function will be considered sepa- 
rately. 

1. Initial response time. 
for initial response time is .981 units (.77 


The mean value 


Measure SS n Square Ratio p 
= sec.). Most of the 16 tasks show initial re- 
prego sponse times that are distributed fairly closely 
DuetoSs 6.601 5 1320 143.6 <.001 about this mean value, but four tasks (6, 11, 
Duetotasks 0.362 15 0.0241 262 .01 10, and 8) yield values that are significantly 
Duetoerror 0.689 75 0.0092 greater than the remainder. The quantity 
Total 7.652 that is being measured is by no means a 
Total response “simple” reaction time, being at least four 
time times greater than the time required to re- 
DuetoSs 11.76 5 2.352 12.9 <.001 spond to simple visual signals (4, 9), and 
Duetotasks 48.57 15 3.238 17.7 <.001 there are probably several factors contribut- 
pye rea MA TSINOTEZ ing to intertask differences. These would in- 
: clude the pattern of the initial visual stimu- 
Error lus (in Tasks 4, 13, 6, and 11, the needle 
DuetoSs 0.0115 5 0.0023 6.7 <.001 swings through a comparatively small angle), 
pe a tasks po Hote 43.3 <.001 And distractions caused by failure to complete 
PRES Hs $ the previous task (particularly with Tasks 

Total 0.2514 
ar 10, 8, 2, and 16). 
+ 28 VOLT Sy 


UNISELECTOR 
O- 5000 m 


D.C. SUPPLY 


FINE 
GALVANOMETER GAIN 


COARSE 
GALVANOMETER 
GAIN. 


Fic. 2. Circuit diagram for pursuit meter. 


290 


Table 2 


Intertask Differences. Mean Values for Individual 
Tasks Arranged in Descending Order of 


Performance 
Initial Total 
Response Response 

Task Time Task Time Task Error 
12 0.895 units 11 2.54 units 11 0.037 cm. 
9 0.921 4 2.69 10 0,039 

5 0,925 14 2.84 16 0.039 
14 0.925 2 2.95 6 0.040 

7 0.930 16 2.97 5 0.044 
15 0.940 13 2.98 14 0.045 

3 0.950 6 3.01 4 0.047 

1 0.953 10 3.14 2 0,048 

16 0.984 3 3.19 12 0.055 

4 0.984 5 3.30 13 0.055 

2 0.992 8 3.53 3 0.058 

13 1.012 12 3.58 8 0.061 

8 1.017 15 4.43 1 0.124 
10 1.027 1 4.58 15 0.142 
11 1,102 7 4,66 7 0.157 

6 1.140 9 4.73 9 0.179 
R; Critical Difference Levels for Above Data 
05 0.124 units 0.55 units 0.024 cm. 
.01 0.176 units 0.78 units 0.033 cm. 


2. Total response time. The mean value 
for total response time is 3.45 units (2.7 
sec.). Values for individual tasks are dis- 
tributed quite widely about this mean value, 
and many differ significantly from each other. 
The most important factor governing the 
time taken over any one task is the sensi- 
tivity in the region of final adjustment, Tasks 
1, 7, 9, and 15 being uniformly difficult in 
this respect. However, where the angle of 
rotation is greater than can conveniently be 
achieved with one wrist movement, this also 
assumes some significance, as in Tasks 8 and 
9 (270°), 12 and 5 (120°). 

3. Error. The mean difference between the 
two voltmeter readings accepted by the six 
Ss was .365 volts. Much of this error is at- 
tributable to four tasks (9, 7, 15, and 1), 
where the extreme sensitivity in the region of 
final adjustment sometimes prevents comple- 
tion of the task within the permitted four 
seconds. Discounting these four tasks, there 
are no significant differences between the re- 
maining 12 tasks. The average accuracy for 


R. J. Shephard 


the 12 tasks is .22 v.—this corresponds to 
approximately one-fifth of a scale division, 
and must be close to the limit of achievement 
with a simple voltmeter scale. Thus it would 
seem that under normal conditions Ss persist 
with each task until matching is achieved to 
be best of their visual ability. 

Reliability of the data. A formal analysis 
of reliability may be obtained by calculating 
the odd-even correlation coefficient for suc- 
cessive 5-min. periods of testing. The values 
for initial response time (r = .89) and total 
response time (r= .85) reach the level re- 
quired for a satisfactory test (3), comparing 
well with values obtained for other pursuit 
meters (2). The coefficient for error is low 
(ry = 35). This is partly due to difficulty 
in measuring, since this component sometimes 
amounts to less than .1 mm. on the galva- 
nometer tracing. Inaccuracies in the error 
measurement are not normally important if 
Ss persist with each task to the limits of 
visual ability. However, if it is specifically 
desired to measure error, the galvanometer 
deflection for a given voltage imbalance may 
be increased. 


Change of Performance During Pressure 
Breathing 


The stress situation of a high breathing 
pressure was chosen partly on account of the 
importance of this maneuver in present-day 
aviation physiology, and partly because little 
previous work had been done in this field. 
Two relevant papers (1, 7) have shown some 
decrement of performance using pressure- 
breathing equipment at an altitude of 47,000 
feet. 

The present observations were made at 
ground level (to avoid possible changes due 
to coincident hypoxia), trunk counterpressure 
was provided, and much higher breathing 
pressures were used. For the first 10 experi- 
ments, a pressure of 78 mm. Hg was main- 
tained for 4 min. Changes in performance 
have been expressed as a percentage of con- 
trol values (Table 3). A similar change was 
observed for each of the four minutes. The 
initial response time (A) and error were in- 
creased, while the total response time (B) 


in 


4 


Null-Point Discontinuous Electrical Pursuit Meter 


291 


Table 3 


Changes in Psychomotor Performance Produced by Breathing at Pressure of 78 mm. Hg 
(10 experiments) 


Mean 
Value SE 
k (%) (%) i n P 
First minute 
Initial response time +10.0 + 4.0 2.50 9 02-.05 
Total response time — 0.8 + 2.6 0.24 9 car 
Error +39.8 +10.3 3.87 9 001-.01 
B-A — 34 & 3.3 1.04 9 — 
Second minute 
Initial response time + 6.0 + 6.7 0.90 9 — 
Total response time — 22 #15 1.47 9 .20-.10 
Error +28.8 + 89 3.23 9 01 
B-A — 32 + 2.7 1.18 9 
Third and fourth minutes 
(7 experiments) 
Initial response time + 38 
Total response time — 41 
Error +40.7 


and the time occupied by muscular move- 
ment (B-A) tended to a slight decrease. 

An additional five observations were made 
at a breathing pressure of 109 mm. Hg 
(Table 4). The error was consistently less 
than at the lower breathing pressure. This 
probably represents a reaction to experience 
of the test under conditions of pressure 
breathing; in support of this view it will be 
noled that in subject R. J. S. a further test 


at 78 mm. Hg gives an even smaller error. 
While most of the Ss had previous experi- 
ence of pressure breathing, none had previ- 
ously endeavored to carry out a skilled task 
during the period of pressurization, and it 
would seem that practice is required to 
achieve a good score under these conditions. 
The relevance of this observation to the in- 
doctrination of aircrew needs no further em- 
phasis. 


Table 4 


Comparison of Changes in Performance at Breathing Pressures of (a) 78 mm. Hg, (b) 109 mm. Hg. 
Percentage of Control Values 


iti Total Response 
sae EO i Time $ Error B-A 
78mm. 109mm. 78mm. 109mm. 78mm. 109mm. 78mm. 109mm. 
Subject Hg Hg Hg Hg Hg Hg Hg Hg 
R. J. S. 
(1) —114 —141 —1.8 —6.1 +100.7 +97.4 +30 -2.8 
(2) — 97 —12.0 —0.8 —1.5 $ 21.5 F 53 +18 +30 
J.E. +22.1 +23.6 —3.4 —2.8 +154 —28.1 —10.2 —12.2 
D. P. +382 +26.2 —1.7 +41 + 23.2 +46 —53 — 43 
LH. +20.6 +28.2 —3.4 —08 +250 +90 — 78 -24 
Mea 1 
Seen +60 +104 227 —14 +37.2 +17.6 = 37 — 37 


292 


An attempt has been made to define the 
factors underlying the changes of perform- 
ance. The pressure-breathing situation—en- 
closure of the head in a rather hot helmet, 
some increase of respiratory effort, and the 
dull pain of extreme peripheral venous con- 
gestion—tends to produce the reactions of 
panic. There is some increase of muscle-ac- 
tion potentials during pressure breathing,® 
and it seems reasonable to suggest that 
muscular tension is increased, thus helping 
to produce a faster muscular response. A 
further factor governing both total response 
time and error is the degree of perseverance 
shown by S; experience here helps S to per- 
sist with his task in the face of discomfort 
and a tendency to panic. Some of the in- 
crease of initial response time may be at- 
tributable to central factors, but part at least 
is due to a restriction of lateral movement of 
the head during pressurization. This can be 
overcome by practice and determination, as 
may be seen from the more normal initial re- 
sponse times observed during successive min- 
utes of pressure breathing. 

It is difficult to reproduce the rapid venous 
distension that occurs in all unpressurized 
areas of the body, but some of the changes 
occurring in the arm can be simulated by ap- 
plying a sphygmomanometer cuff. Thirteen 
experiments with the cuff inflated to 78 mm. 
Hg showed a significant fall in the initial re- 
sponse time, and no change in the other pa- 
rameters of performance (Table 5). The 
fall of initial response time may be attributed 


Table 5 


Changes in Psychometric Performance Produced by 
Sphygomanometer Cuff Inflated to 78 mm. Hg 
(cuff applied to r. upper arm) 

(Results expressed as % of control value; 

13 experiments.) 


Mean 
Value SE 
(%) (%) t n P 
Initialresponse time —5.5 =+2.0 2.75 12 .02-.01 
Totalresponsetime +0.7 +1.7 0.32 12 — 
Error +44 +60 0.57 12 — 
B-A +1.0 +21 048 12 — 


8R. J. Shephard, unpublished observations. 


R. J. Shephard 


Table 6 


Changes in Psychometric Performance Produced by 
Wearing Pressure-Breathing Helmet 
(13 experiments) 


Mean 

Value SE 

O EOE = P 
Initial responsetime + 4.5 + 3.5 1.28 12 .3-.2 
Total responsetime + 2.7 + 1.0 2.70 12 .02 
Error +16.9 +10.3 1.64 12 1-2 
B-A +17 +418 095 12 — 


tentatively to facilitation by the discomfort 
and pain of venous congestion arising from 
the same arm. It is probable that more gen- 
eralized venous congestion produces a similar 
effect; if so, limitation of neck movement 
during pressurization has even more influence 
on the initial response time than Table 4 
would suggest. 

Experiments with the helmet unpressurized 
indicate the effect of wearing the equipment 
alone (Table 6). Even in the unpressurized 
state, lateral movement of the head is less 
easy than it is normally, and this is probably 
responsible for the changes observed—a sig- 
nificant increase of the total response time, 
and some tendency to increase of error. 

Neither local venous congestion nor the 
wearing of pressure-breathing equipment ac- 
count for all the changes observed during 
pressure breathing; part of the performance 
decrement must be attributed to other fac- 
tors, including possibly a specific panic re- 
action to the pressurization or a decreased 
cerebral blood flow. 


Discussion 


Aviation psychology is concerned largely 
with performance under conditions of stress, 
and in an aviation laboratory the practical 
value of a psychometric test is often deter- 
mined by its ability to detect changes asso- 
ciated with specific stresses. The apparatus 
described above seems to possess sufficient 
sensitivity to show both an immediate “panic” 
response to the stress of pressure breathing, 
and a progressive improvement as the Ss be- 
come accustomed to performing a skilled task 


Null-Point Discontinuous Electrical Pursuit Meter 


in the pressurized state. It shows a number 
of other favorable features. Readings can be 
obtained every 4 sec., and are presented as 
permanent objective records, suitable for sta- 
tistical analysis. Under normal resting con- 
ditions, results for any one S$ show a satisfac- 
tory reliability, and there is little learning 
effect. The intersubject variation (often con- 
sidered an index of a good test) is quite large. 
Further, the external appearance of the ap- 
paratus bears some resemblance to the con- 
trol panel of an aircraft, and the adjustment 
required is at least as difficult as the tasks 
normally encountered in flying. The chief 
drawback is the time required for the analy- 
sis of the tracings—this could certainly be 
decreased by the use of a planimeter, and 
with adequate training the measurements 
might be made by a good technician. 

Tt is of some interest to consider the psy- 
chological functions that are being measured 
by the apparatus. The initial response time 
probably represents the time required to read 
the left-hand dial, memorize the reading, and 
initiate appropriate coordinated movements 
of the wrist, although some individuals (de- 
pending partly on personality) may initiate 
movements with inadequate memorization and 
refer to the left-hand dial again during the 
period of adjustment. The total response 
time represents the time required to com- 
plete a fairly simple coordinated task. It in- 
cludes in addition to the initial response time 
the period occupied by muscular movement 
and the time required for making a final 
judgment of accuracy. Sometimes S over- 
shoots the balance point by 1-2 v., and it is 
then possible to measure a further parameter 
—the reaction time for small corrective move- 
ments. In contrast to some preliminary ob- 
servations of Davis,! the time required to 
initiate such movements with the present ap- 
paratus (0.2-0.3 sec.) seems at least as great 
as the expected “simple” reaction time for a 
visual stimulus. The factor governing error 
seems normally to be the ability to read a 
simple voltmeter scale; under conditions of 
stress it is likely that the judgment and perse- 
verance of S also become involved. 


4R. Davis, personal communication, 1955. 


293 


The apparatus is capable of modification 
to test other psychological functions. The 
adjustment available for the selective study 
of accuracy of performance has already been 
mentioned. For the testing of addition or 
subtraction, a galvanometer in the recording 
camera may be aligned at, for instance, 3 v. 
off balance; S is then required to add or sub- 
tract three from each of the left-hand dial 
readings. For code substitution an additional 
rheostat, Rı, is inserted in series with the 
uniselector, and S is given a table showing 
the values of Və that are required to balance 
different readings of Vı. To test an S’s pow- 
ers of discrimination, he may be instructed 
not to respond to one reading (for exam- 
ple, Vi = 16.4 v.). Finally, if interested in 
visual contrast discrimination, special volt- 
meter dials may be prepared having the fig- 
ures more clearly marked at one end of the 
scale than at the other. Other possible appli- 
cations could be described, but these exam- 
ples are sufficient to illustrate the versatility 
of the apparatus. 


Summary 


Description is given of a null-balance elec- 
trical pursuit meter based on a Wheatstone 
bridge circuit. Evaluation in a group of 
normal Ss shows that under resting condi- 
tions it yields repeatable measurements of an 
initial response time and a total response 
time for a coordinated manual task of the 
type encountered in flying an aircraft. Pos- 
sible applications include addition and sub- 
traction problems, code substitution, dis- 
crimination tests, and measurements of visual 
contrast discrimination. 

During the stress of high-pressure breath- 
ing, there is a significant increase of initial 
response time and error, while the total re- 
sponse time tends to be reduced. These 
changes cannot be reproduced by local venous 
congestion or the wearing of pressure breath- 
ing equipment alone, and it is suggested that 
they represent a panic reaction to the pres- 
surization. Training gives a marked improve- 
ment in the ability of all subjects to perform 
the task during the period of pressurization. 


Received November 7, 1955. pie 


j may, Research | 
sauednl) oY pen 
T E AWNG GOLLLY- \ 


294 


- References 


1. Barach, A. L., Eckman, M., Bloom, W. L., Eck- 
man, I., Rule, C., Rumsey, C. C., & Wortis, J. 
Studies on positive pressure respiration. IV. 
Subjective, clinical and psychological effects of 
continuous positive pressure breathing at high 
altitudes. J. aviat. Med., 1947, 18, 252-258. 

2, Finan, J. L., Finan, S. C., & Hartson, L. D. A 
review of representative tests used for the 
quantitative measurements of behavior decre- 
ment under conditions related to aircraft 
flight. USAF, WADC Tech. Rep., 1949, No. 
5830. 

3. Grether, W. F. Criteria of anoxia tolerance; I. 
Development of psychological tests for use in 
the altitude chamber. USAF Sch. Aviat. 
Med. Proj. Rep, 1942, Proj. No. 89 (Rep. 
No, 1, 2). 

4. Hirsch, A. Expériences chronoscopiques sur la 
vitesse des différentes sensations et de la trans- 


10. 


R. J. Shephard 


mission nerveuse. 
1861-1864, 6, 100-114. 
worth [10].) 


Soc. Sci. naturel Bull., 
(Quoted in Wood- 


. Loucks, R. B. An evaluation of various psy- 


chological performance tests for altitude cham- 
ber research. USAF Sch. Aviat. Med. Proj. 
Rep., 1944, Proj. No. 202 (Rep. No. 1). 


. Mather, K. Statistical analysis in biology. 
London: Methuen, 1951. 

. Melton, A. W. (Ed.) Apparatus tests. Wash- 
ington: U. S. Government Printing Office, 
1947. (AAF Aviat. Psychol. Program Res. 
Rep. No. 4.) 

. Miles, W. R. Selected psychomotor measure- 
ment methods. Methods med. Res., 1950, 3, 
142-218. 

. Miles, W. R. Correlation of reaction and co- 


ordinating speed with age. 
chol., 1931, 43, 377-391. 

Woodworth, R. S. Experimental psychology 
London: Methuen, 1950. 


Amer. J, Psy- 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


The Duration of Movement Components in a Repetitive 
Task as a Function of the Locus of a 
Perceptual Cue* 


J. Richard Simon * 


University of Wisconsin 


This study is concerned with the general 
problem of the role of perceptual processes in 
human motion. ‘The specific variable ma- 
nipulated is the locus of a perceptual cue 
within a repetitive patterned motion. A sim- 
ple assembly operation is chosen for study 
because of its obvious importance in the in- 
dustrial setting and because of the ease of 
defining and timing the parts or components 
of the motion cycle. 

For several years, electronic methods of 
motion analysis have been used to record the 
duration of the component movements in 
various motion cycles (5, 10, 12). Simon 
and Smader (10) extended this methodologi- 
cal approach to the problem of the role of 
perceptual processes in motion. They found 
that a specific visual discrimination imposed 
on part of a motion cycle had a generalized 
effect on the entire motion pattern. In that 
initial study no attempt was made to con- 
trol or vary the point in the work cycle at 
which the discrimination occurred. This 
problem has been instrumentally solved and 
the solution applied in the present study. 

By systematically introducing a perceptual 
cue into the various components of an other- 
wise unchanged pattern of motion, the role 
of the locus of the cue in defining the tem- 
poral relations within the motion cycle was 
determined. Information of this sort may 
be applied to the design of more efficient 
work operations. The evidence presented 
here has relevance also to the evaluation of 


__ 1 This article is based on a dissertation submitted 
in partial fulfillment of the requirements for the de- 
gree of Doctor of Philosophy at the University of 
Wisconsin, Financial support of this research came 
from the National Science Foundation under a grant- 
in-aid for a project on perception and motion, di- 
rected by Karl U. Smith. The writer is indebted to 
Professor Smith for his guidance. 

Now at the American Institute for Research, 
Pittsburgh, Penna. 


some of the basic time-and-motion study con- 
cepts. 


Method 
Apparatus 


Figure 1 is a sketch of the simplified assembly ar- 
rangement used. The assembly plate is $ in. thick 
and 9 in. square. It is divided in half by a vertical 
line. Sixty-four holes, + in. in diameter and 1 in, 
apart have been machined in the plate, eight holes 
to a row. The parts-supply bin is 5 in. from the 
nearest row of holes in the assembly plate. The bin 
is 58 in, wide and 5ł in. long. A curved piece of 
sheet steel forms the floor. Across the front opening 
of the bin is a thin metal bar, The S must reach 
over this bar in order to grasp a pin from the bin, 

The bin contains 80 precision made pins ł in. in 
diameter and 1 in, long. The ends of the pins are 
slightly tapered. Forty pins are cadmium-plated, 
which gives them a silver color easily distinguish- 
able from the other 40 pins which are copperplated. 
A transparent colored barrier may be easily inserted 
across the front opening of the supply bin. With 
this barrier in place, the pins, though clearly visible, 
are indistinguishable with regard to color. It will 
be shown later how this control over the appearance 
of the pins is used in varying the locus of the per- 
ceptual cue. 

The electronic motion analyzer, pictured at the 
right in Fig. 1, provides separate and automatic 
measurement of the durations of the four compo- 
nents of the work cycle. The total time per trial 
for these components, viz., grasp, loaded travel, as- 
sembly, and unloaded travel, is recorded in hun- 
dredths of a second on four precision clocks. 

The analyzer, described previously (10, 11), con- 
sists of a four-channel electronic relay circuit actu- 
ated by a subthreshold current. The § acts as a 
key in the circuit, ie, the clocks are automatically 
energized by his movements in performing the task. 

A main part of the apparatus for purposes of this 
study is the cue-control system. It consists of the 
transparent barrier and two neon signal lights lo- 
cated between the bin and the assembly plate. By 
use of the signal lights, the barrier, and proper in- 
structions, it is possible to systematically introduce 
a perceptual cue into any phase of the basic motion 
cycle and to determine its effect on the time re- 
quired for the four component movements of the 


cycle. 


295 


296 


Fic. 1. 


Sketch of simplified assembly arrangement. 


Procedure and Experimental Design 


The independent variable was the locus of a per- 
ceptual cue within a pattern of motion. The de- 
pendent variables were the durations of the four 
component movements comprising the motion cycle. 
The S’s task consisted of inserting metal pins into 
holes in an assembly plate. A trial was defined as 
the transporting of 40 pins, one at a time, from the 
parts supply bin to the work area and inserting each 
one in a hole in the plate. This basic repetitive op- 
eration was held constant throughout all the experi- 
mental conditions. 

Six variations of the basic task made up the six 
experimental conditions. The first two variations 
were control conditions. The last four consisted of 
systematically varying the point within the work 
cycle at which a perceptual cue was introduced. 

1. Control—no discrimination. This variation in- 
volved no perceptual cues and therefore served as a 
basis for comparison with Conditions 4 through 6 
below. The S was simply instructed to assemble 
me 40 pins as rapidly as possible, disregarding their 
color, 

2. Control for barrier—no discrimination. This 
condition was the same as Condition 1 except that 
the transparent barrier was inserted in front of the 
bin. The barrier, used to place the perceptual cue 
in the loaded-travel component of the cycle, may 
itself have affected the durations of any or all of the 
component movements in the cycle. Therefore this 
condition was included to serve as the control for 
Condition 3. 

3. Cue in loaded travel. The transparent barrier 
was inserted in front of the bin. The S was in- 
structed to pick up one pin at a time, inserting the 
copper pins in the left side of the plate and the 
silver pins in the right side. Since the barrier made 
the pins indistinguishable with regard to color, S 
was not able to make the necessary discrimination 


J. Richard Simon 


until he had grasped a pin and started to carry it to 
the assembly area, ie., he made the discrimination in 
the loaded-travel part of the cycle. 

4. Cue in unloaded travel. The S was instructed 
to alternate, first picking up a copper pin and then 
a silver pin, until he had 40 pins inserted in the 
plate. Here the discrimination must of necessity 
take place before S can grasp the correct colored 
pin, ie. in the unloaded-travel part of the work 
cycle. 

5. Cue in grasp. One or the other of two signal 
lights automatically flashed as S grasped a pin from 
the bin. Since S’s contact with the bin activated 
the light and since the light went off as soon as 
grasping the pin was complete, the perceptual cue 
was available during and only during the grasp com- 
ponent of the cycle. If the left light went on, S$ 
placed the pin he had grasped in the left side of the 
plate. If the right light went on, he placed the pin 
in the right side of the plate. Color of the pins was 
disregarded. Placing the pin in a hole automatically 
advanced a stepping relay which controlled the se- 
quence of lights so that upon grasping the next pin, 
a new signal was presented. 

6. Cue in assembly. The same two signal lights 
were used, only this time S’s contact with the as- 
sembly plate presented a signal light and breaking 
contact with the plate turned off the light. The 
light signaled where the next pin was to be placed, 
i.e, as S placed each pin in the plate, he received the 
cue necessary to perform the following motion cycle. 
Grasping a pin from the bin automatically advanced 
the stepping relay so that upon assembling this pin, 
a new signal was presented. 

The method of advancing the stepping relay, as 
just described, eliminated the complicating factor of 
lag or reaction time of the relay. That is to say, 
the signal light was presented simultaneously with 
completion of the circuit by S, since the relay al- 
ready had been advanced into position by the previ- 
ous work movement. The stepping relay generated 
a random sequence of light signals which changed 
from trial to trial. 

Thirty right-handed college students were used as 
Ss. Each S performed under all six variations of 
the task. A latin-square design was used to control 
individual differences and order of presentation of 
the experimental conditions. Each S was assigned 
to one of 30 sequences of conditions provided by 
five independently drawn 6 X 6 latin squares, and he 
performed only in his assigned sequence of condi- 
tions for the duration of the experiment. The Ss 
were run in groups of six per week. Each group of 
Ss completed one of the independent latin squares. 

The Ss were tested for five consecutive days. On 
Days 1, 2, and 3, S ran through each experimental 
condition twice, making a total of 12 trials. On 
Days 4 and 5 each experimental condition was re- 
peated three times for a total of 18 trials. The ad- 
ditional trials on the last two days were added in 
an attempt to obtain more reliable measures of per- 
formance. 


Duration of Movement Components 


Results 


The performance of 30 Ss on Day 5 was 
analyzed to determine the effects of the ex- 
perimental conditions on the component 
movements of the work cycle. A median 
score for each component of the motion un- 
der each experimental condition was deter- 
mined. 

Four separate analyses of variance were 
performed,® one for each of the component 
movements. These analyses* brought out 
the fact that the experimental conditions pro- 
duced significant (p < .01) variations in the 
durations of all four component movements 
of the work cycle, viz., grasp, loaded travel, 
assembly, and unloaded travel. 

Let us now consider separately the four 
components of the task beginning with loaded 
travel. Figure 2 is a bar graph of the mean 
duration of the loaded-travel part of the 
cycle under the six experimental conditions. 
Since the experimental conditions produced 
significant variation in the duration of this 
part of the work movement, the crucial step 
was a comparison between the conditions in- 
volving perceptual cues and the control con- 
ditions in which no specific cues occurred. 
By using as a base the duration of loaded 
travel in the control condition, we were able 
to determine how the loaded-travel compo- 
nent was affected by locating the perceptual 
cue in the four different parts of the cycle. 
To accomplish this comparison a Duncan 
Range Test (4) was performed. The Duncan 
Test determines the significance of differences 
between ranked treatment means in an analy- 
sis of variance. Essentially, it indicates the 
number of significant gaps (at the 5% level) 
between the ranked means. 


3 Before the analyses of variance were computed, 
the data were subjected to the Bartlett chi-square 
test for homogeneity of variance. The data satis- 
fied the assumption of random sampling from popu- 
lations with a common variance. 

4Summaries of the analyses of variance and other 
statistical tests referred to in this report have been 
deposited with the American Documentation Insti- 
tute. Order Document No. 4973 from ADI Aux- 
iliary Publications Project, Photoduplication Service; 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $125 for 
Photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 


297 


In Fig. 2 the means are ranked from low 
to high. Results of the Duncan Test are in- 
dicated by the brackets. Note that the dura- 
tion of the loaded-travel movement is great- 
est when the perceptual cue is located in 
loaded travel. The Duncan Test indicated 
that this mean was significantly different 
from all the means ranked below it. The 
next longest duration of the loaded-travel 
movement occurred when the perceptual cue 
was placed in the grasp component of the 
cycle. This mean, too, differs significantly 
from all the others. There are no gaps be- 
tween the remaining four means indicating 
that these experimental conditions do not 
produce significant variations in the duration 
of loaded travel. 

To summarize the results presented in Fig. 
2, a cue placed in the loaded-travel or grasp 
components significantly increased the dura- 
tion of loaded travel over the time required 
for the same motion in a control condition in- 
volving no perceptual cues. However, a per- 


SECONDS 


Guam 


CONTROL CONTROL  CUEIN CUE IN CUEIN' CUE IN 
FOR NO ASSEMBLY UNLOADED- GRASP LOADED" 

BARRIER- PERCEPTUAL COMPONENT TRAVEL COMPONENT TRAVEL 

NO GUE CUE COMPONENT COMPONENT 


LOADED - TRAVEL COMPONENT 


Frc. 2. Duration of Joaded-travel component un- 
der the six experimental conditions. Brackets indi- 
cate significant gaps between ranked treatment 


means. 


298 
SECONDS Fea 
19 
PEEN 
18 E 
17 
16 
CONTROL CUE IN CONTROL CUE IN CUE IN CUE IN 
NO GRASP FOR ASSEMBLY LOADED- UNLOADED- 
PERCEPTUAL COMPONENT BARRIER- COMPONENT TRAVEL TRAVEL 
CUE NO CUE COMPONENT COMPONENT 
UNLOADED-TRAVEL COMPONENT 
Fic. 3. Duration of unloaded-travel component 


under the six experimental conditions. Brackets in- 
dicate significant gaps between ranked treatment 
means, 


ceptual cue placed in the assembly or un- 
loaded-travel components did not significantly 
affect the duration of loaded travel. 

Figures 3, 4, and 5 present the remaining 
three parts of the work cycle in a similar 
fashion. Figure 3 shows the means of the 
unloaded-travel part of the work cycle. It 
can be noted that a cue placed in unloaded 
travel or loaded travel significantly increased 
the duration of the unloaded-travel motion 
over the time required for the same motion 
under a control condition involving no per- 
ceptual cue. However, a perceptual cue 
placed in the assembly or grasp components 
of the work cycle did not significantly affect 
the duration of unloaded travel. 

Figure 4 pictures the mean duration of the 
grasp component under the six experimental 
conditions. Placing a perceptual cue in any 
of the components of the task, viz., assembly, 
grasp, loaded travel, or unloaded travel, had 
the effect of increasing the duration of the 
grasp component over the time required for 


J. Richard Simon 


the same motion under a control condition 
involving no perceptual cue. There was also 
a significant gap between the two control 
conditions. This difference means that the 
presence of the transparent barrier per se in 
some way increased the duration of the grasp 
component. 

Figure 5 indicates that the time required 
for the assembly component was significantly 
increased only by placing the cue in the as- 
sembly part of the cycle. 

The principal findings of the present study 
are summarized in Table 1. Results of the 
four separate Duncan Tests are integrated in 
order to show at a glance the effects of any 
specific cue locus on the durations of all four 
component movements in the work cycle. In 
all cases the time required for a component 
movement is compared with the duration of 
the same movement under the appropriate 
control condition in which no perceptual cue 
was involved. For example, in line 1 of 
Table 1, when unloaded travel was the locus 
of the perceptual cue, we find a significant 


SECONDS 


NN 


CONTROL CUEIN  CUEIN CUEIN CUE IN 
no FoR ASSEMBLY UNLOADED- GRASP LOADED” 
PERCEPTUAL BARRIER— COMPONENT TRAVEL COMPONENT TRAVEL 
cue NO CUE COMPONENT: COMPONENT 


GRASP COMPONENT 


Fic. 4. Duration of grasp component under the 
six experimental conditions. Brackets indicate sig- 
nificant gaps between ranked treatment means. 


Duration of Movement Components 


increase in the duration of the unloaded- 
travel component. Duration of the grasp 
component of the cycle also increased sig- 
nificantly. The times for loaded travel and 
assembly were not significantly altered. 

Effects of practice on the component move- 
ments. Differences noted between the ex- 
perimental conditions on Day 5 appeared 
consistently over the first four days of prac- 
tice as well. The average decrease in the 
duration of the grasp component from Day 1 
to Day 5 was 23%. The assembly compo- 
nent decreased 8% over the same period. 
The durations of loaded travel and unloaded 
travel both decreased 14%. 

A test was made of the hypothesis that 
there is no difference in the amount of learn- 
ing which takes place in a component involv- 
ing a perceptual cue and in the same compo- 
nent when no perceptual cue is involved. In 
all cases we were able to reject the null hy- 
pothesis ( < .05). Thus, it appears that 
more learning occurs in a component when it 
is perceptually loaded than when it is not. 


SECONDS 


CONTROL CUE IN CUE IN 
No LOADED- GRASP 
PERCEPTUAL TRAVEL COMPONENT BARRIER ~ 
CUE COMPONENT NO CUE COMPONENT 


ASSEMBLY COMPONENT 


Fic. 5. Duration of assembly component under 
the Six experimental conditions. Brackets indicate 
Significant gaps between ranked treatment means. 


299 


Table 1 


Effects of Cue Locus on the Durations of the Four 
Component Movements in the Work Cycle 


Movement Component 


Locus of the 
Perceptual Unloaded Loaded Assem- 
Cue travel Grasp travel bly 
Unloaded travel af oh 0 0 
Grasp 0 £ + 0 
Loaded travel F + Ha 0 
Assembly 0 + 0 + 


Note.— + indicates component time significantly increased; 
0 indicates component time not significantly altered. 


Reliability of measures. Median scores 
from Day 4 for each movement component 
under each experimental condition were cor- 
related with the comparable measure from 
Day 5. In general, all four components of 
the task showed a high level of consistency 
from day to day. The correlation coefficients 
were of the order of + 0.80 to + 0.90. 


Discussion 


Two broad generalizations are suggested. 
First of all, it is apparent that a perceptually 
loaded component takes significantly more 
time than its counterpart which involves less 
perceptual load. We reached this conclusion 
by comparing the duration of any component 
when it was the locus of a perceptual cue 
with the duration of the same component 
under a control condition where no cues were 
involved. This result held true regardless of 
whether the cue was placed in unloaded 
travel, grasp, loaded travel, or assembly. Re- 
cently, Seymour (9) reported a study in 
which he varied the perceptual load of a 
constant length movement and found that 
duration of the movement increased as the 
perceptual requirements increased. As far as 
it is known, the present study is the first in 
which the perceptual load of the manipula- 
tive components of a task, i.e., assembly and 
grasp, have been changed without actually 
altering the character and complexity of the 
movement. 

A second generalization perhaps is of greater 
importance. Regardless of the point in the 
work cycle at which the cue was placed, the 
duration of at least one other component be- 


300 


sides the one containing the perceptual cue 
was significantly increased. For example, 
when the locus of the cue was unloaded 
travel or grasp, the duration of the immedi- 
ately adjacent movement component was sig- 
nificantly increased. It might appear that 
the information presented during the previ- 
ous component was organized or processed 
centrally and that this somehow increased 
the duration of the movement. However, 
when the locus of the cue was loaded travel 
or assembly, there appeared to be little con- 
nection between where the cue was presented 
and acted upon and which specific parts of 
the movement were affected. It is these spe- 
cific results that could not have been pre- 
dicted from existing evidence about the na- 
ture of high-speed performance (13). 

Explanation of the far-reaching effects of 
placing the cue in loaded travel probably lies 
in an interruption of an ongoing movement 
affecting the over-all rhythm of the task. 
Here, S had to make a discrimination and 
act upon it immediately. The 65% increase 
in the duration of loaded travel and the sig- 
nificant increases in grasp and unloaded travel 
emphasize the importance of advance infor- 
mation (7) for the smooth functioning of 
sensorimotor skills. Why the duration of as- 
sembly remained unaltered is not readily ap- 
parent, However, this does serve to illus- 
trate that the effect of the perceptual cue is 
not a simple overlapping to immediately ad- 
jacent components. 

The present results show conclusively that 
variations in the perceptual requirements of 
one part of the work cycle significantly affect 
the durations of other parts of the cycle. 
This finding is damaging evidence to a con- 
cept implicit in a good deal of the writing on 
time-and-motion study. This concept, that a 
movement pattern can be treated as a com- 
bination of independent and discrete ele- 
ments, provides the basis for a large number 
of predetermined time-standard systems. To 
set the standard for a new job, the operation 
is analyzed into elements, each element is as- 
signed a predetermined standard time, and 
the total of the element times with certain 
adjustments becomes the time allowed for 
the job. 


J. Richard Simon 


: 


There have been, in the past, suggestions to 
the effect that the components of a move- 
ment are not independent (2, 8, 10). How- 
ever, the existence of the wide variety of pre- 
determined time standard systems in itself 
provides abundant proof that, this point of 
view is not widely known or accepted. 

Some of the predetermined time systems 
do attempt to handle perceptual factors either ~ 
by considering the perceptual response as a 
separate element in the motion cycle and as- — 
signing it a standard time (3, 6), or by mak- 
ing allowances for such things as the degree 
of visual control required for the movement 
(1). Even if it were possible to measure the 
perceptual requirements of each part of the — 
work cycle and adjust the time standards ac- 
cordingly, the problem still remains of con- — 
sidering the effect of the perceptual loading 
of one factor on the durations of the other 
parts of the task. The question which arises 
is whether any predetermined time system 
will ever be able to predict accurately this 
complicated interrelation between perceptual 
processes and motion. 


Summary 


This study was concerned with the inter- 
relation of perceptual processes and work 
movements. The specific variable manipu- 
lated was the locus of a perceptual cue within 
a repetitive patterned motion. A simplified 
assembly task was used. It consisted of in- 
serting 40 metal pins into holes in an as- 
sembly plate. Special techniques made pos- 
sible the specification and control of the exact 
point in this work cycle at which a percep- 
tual cue was located. By systematically in- 
troducing the cue into various components of 
an otherwise unchanged pattern of motion, + 
the effects of the locus of the perceptual cue 
in defining the temporal relations within the 
motion cycle were determined. Electronic 
methods of motion analysis were used to re- 
cord separately and automatically the dura- 
tions of the four component movements of 
the work cycle, viz., unloaded travel, grasp, 
loaded travel, and assembly. 

Results indicated that, depending upon the 
locus of the cue, the durations of some com- 
ponents were increased while others were not 


affected. Two generalizations were suggested. 
First, it is apparent that a perceptually loaded 
component takes significantly longer than its 
counterpart which involves less perceptual 
load. Secondly, placing a perceptual cue in 
one part of a work cycle not only affects the 
duration of that part of the cycle, but also 
significantly affects the durations of certain 
other parts of the movement. This finding 
is damaging evidence to a concept implicit in 
a good deal of the writing in the time-and- 
motion study field. This concept is that a 
work movement can be treated as a combina- 
tion of independent and discrete elements. 
It seems unlikely that existing predetermined 
time systems can handle accurately the com- 
‘plicated interrelation between perceptual proc- 
esses and motion which has been demon- 
strated in this study. 


Received December 13, 1955. 


References 


1. Barkin, S. An evaluation of predetermined time 
standard systems. Time & Motion Stud., 
1954, 3, 24-32. 

| 2, Barnes, R. M., & Mundel, M. E. Studies of 

hand motions and rhythm appearing in fac- 

tory work. Univer. Iowa Stud. Engng, 1938, 

Bull. 12. 


Duration of Movement Components 


301 


3. Barnes, R. M. Motion and time study. 
York: Wiley, 1940. 

4. Duncan, D. B. A significance test for differences 
between ranked treatments in an analysis of 
variance. Virginia J. Sci, 1951, 2, 171-189. 

5. Harris, S. J., & Smith, K. U. Dimensional analy- 
sis of motion: VII. Extent and direction of 
manipulative movements as factors in defin- 
ing motions. J. appl. Psychol., 1954, 38, 126- 
130. 

6. Holmes, W. G. Applied time and motion study. 
New York: Ronald, 1938. 

7. Leonard, J. A. Advance information in sensori- 
motor skills. Quart. J. exp. Psychol., 1953, 5, 
141-149. 

8. Schutt, W. H. Time study engineering. New 
York: McGraw-Hill, 1943. 

9. Seymour, W. D. Manual skills and industrial 
productivity. Instn Production Engrs J., 
April, 1954, 3-10. 

10. Simon, J. R., & Smader, R. C. Dimensional 
analysis of motion: VIII. The role of visual 
discrimination in motion cycles, J. appl. Psy- 
chol., 1955, 39, 5-10. 

11. Smader, R. C., & Smith, K. U. Dimensional 
analysis of motion: VI. The component move- 
ments of assembly motions. J. appl. Psy- 
chol., 1953, 37, 308-314. 

12. Wehrkamp, R. A., & Smith, K. U. Dimensional 
analysis of motion: II. Travel distance effects. 
J. appl. Psychol., 1952, 36, 201-206. 

13. Welford, A. T. The psychological refractory pe- 
riod and the timing of high-speed performance 
—a review and a theory. Brit. J. Psychol, 
1952, 43, 2-19. 


New 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


The Electronic Handwriting Analyzer and Motion Study 
of Writing ** 


Karl U. Smith and Richard Bloom 


University of Wisconsin 


Although writing is an almost universal 
skill in modern Western society, there have 
been few if any really scientific studies of the 
different motions used in this performance. 
Recent studies in this field have been di- 
rected very largely to the questions of the 
pressure patterns in writing (2, 3, 4, 5, 8), 
and have involved the over-all measurement 
of the force and speed of writing. Older lit- 
erature of a psychological nature in the field 
dealt with the use and validity of graphologi- 
cal indices in indicating certain personality 
traits, 

Like many other areas of motion study, the 
scientific investigation of movement patterns 
in handwriting has been limited by the meth- 
ods available for studying performance. Our 
primary concern here, then, is an attempt to 
improve methods of studying the movement 
characteristics of writing skills. Specifically, 
we have applied the principles of electronic 
motion analysis (1, 6, 7) to the measurement 
of the time required for the component move- 


ments in writing. A preliminary investiga-: 


tion’ is described here in which an Electronic 
Handwriting Analyzer is used to measure the 
durations of manipulation (contact) and 
travel movements in a writing task. 


Methods and Apparatus 


Figure 1 illustrates the general method used in the 
Electronic Handwriting Analyzer. This device is 
based on a principle of using the writer as an elec- 
tronic key, so that each time he touches the surface 
of the writing paper, he automatically starts and 
stops precision time clocks that measure the dura- 
tion of his writing movements. 


1This research was supported by funds from the 
National Science Foundation under a grant for the 
study of perception and motion. Financial support 
for building the equipment for this study came from 
the Graduate School Research Committee, The Uni- 
versity of Wisconsin. 

? The basic computational work of this research 
was aided by the Computing Service, The Univer- 
sity of Wisconsin. 


302 


As shown in Fig. 1, the subject (S) sits at an 
ordinary writing table. In his left hand he holds a 
metal electrode that is attached to the switching 
circuit of an electronic motion analyzer. The sub- 
ject writes with a metal pencil filled with IBM 
electrographic leads. The paper used, Teledeltos pa- 
per,® is also electrically conductive, and is con- 
nected to the switching circuit of the handwriting 
analyzer located inside the electrical housing to the 
right of Fig. 1. When S makes contact with the 
writing paper, he immediately closes one side of the 
switching circuit of the handwriting analyzer, which 
operates a precision time clock. When the pencil 
is lifted from the paper after writing a letter or 
word, this first clock is automatically stopped and 
a second clock starts to run. The second clock 
measures the duration of the travel movement be- 
tween the first writing contact and any subsequen 
contact with the paper. 

The writing situation in this method is quite nor- 
mal except for the use of the special paper, the sur- i’ 
face of which appears to be somewhat smoother 
than that of ordinary bond paper. The S does not 
feel at all the very low-level current that is passed 
through his body in operating the handwriting ana- 
lyzer. Special precautions are taken to prevent the 
possibility of electric shock. 

Figure 2 shows diagrammatically the circuits used 
in the Electronic Handwriting Analyzer. As noted | 
above, the subject in this task acts as an electronic 
key completing a switching circuit, Mn. This relay 
is closed when S touches the paper with his pend, F 
starting the manipulation clock, Mr, that continues 
to run as long as the pencil touches the papet- 
When S lifts his pencil, however, to travel to the 
next letter or word position, the switching circuit 
automatically stops the first clock and starts the 
clock, Tx, which measures the duration of this travel 
movement. When the pencil touches the papet 
again, the travel clock, Tx, is stopped and the ma 
nipulation clock, Mz, started again. Upon the com- 
pleting of one line of writing across the papel the 
pencil touches the stopping plate on the right margin, 
automatically stopping both clocks. ? 

With certain modifications this analyzer is being 
made into a portable instrument that can be car- 
tied into the schoolroom or hospital room. We look 
upon this device as a widely applicable instrument 
for the measurement of psychomotor skill for valk 
ous psychological and educational uses, particularly 
in medical, industrial, military, and school situation 


3 Western Electric Co., New York City- 
*Potter Instrument Co., Boston, N. Y. 


— 


Motion Study of Writing 


Fic. 1. The Electronic Handwriting Analyzer. 
writes on Teledeltos paper. 
, val timer to the right. One of the interval tim 


picture. 


In the present preliminary study, the Electronic 
Handwriting Analyzer has been applied to the meas- 
urement of writing single Arabic numerals and let- 
ters of the alphabet. Ten college student Ss wrote 


TRAVEL i 
S 
ARC i STOP PLATE 


AS 


WRITING 
PAPER 


Fic, 2. The circuit relations of the handwriting 
analyzer. The switching circuit is marked as Mz 
and Tr, and the interval timers as Mr and Tr. The 
writing on the paper completes the manipulation 
component of the circuit, and the travel component 
is automatically operated thereafter. Touching the 
stop plate stops the travel clock. 


ting The subject holds an electrode in the left hand and 
The switching circuits of the handwriting analyzer are located inside the inter- 
ers, the one recording travel time, 


303 


Mel 


cannot be seen in this 


35 characters according to certain limited control 
conditions. Each S wrote the 35 characters in dif- 
ferent orders chosen in terms of a randomizing pro- 
cedure. One letter was written in each trial. The 
S was instructed to begin at the left of the writing 
paper and write a given letter or number repeatedly 
across the paper to the right edge where the last 
character was to be written on the stopping plate. 
He was instructed to write at his usual speed, which 
meant in most cases that a line of writing included 
10 to 15 distinct letters or numbers. There was no 
attempt in the experiment to obtain data on speed 
writing or to change the usual method of writing in 
any way. One possible exception to this latter state- 
ment is that S was asked to omit dotting the 7 and 
j, and crossing the t. 

In order to obtain the data of this study, the mean 
manipulation and travel times for each character for 
each individual S were computed by dividing the 
recorded times by the number of distinct letters 
written in that trial. Analysis of variance of these 
manipulation and travel measures was carried out 
to determine whether, in the 10 Ss, a significant dif- 
ference occurs between characters and between Ss 
in regard to the measures of manipulation and 
travel time. 

In addition to the determination of the signifi- 
cance of differences between characters in regard to 
the duration of the travel and manipulation move- 
ments involved, correlations between the travel and 
manipulation scores for individual Ss were also com- 
puted and a special measure of writing coordina- 
tion, the manipulation-travel ratio, is computed. 


304 


Table 1 


"The Mean and Standard Deviation Values for Manipu- 
lation Time and Travel Time Required to Write 
Common Numbers and Letters of 
the Alphabet 
Time values are in seconds 


Manipulation Travel 
(Seconds) (Seconds) 
Character Mean o Mean o 
a 56 047 .20 087 
b 63 113 27 .074 
c Al 192 25 065 
d 56 133 +24 .068 
e Al .081 25 .092 
f 59 Att 23 .071 
g 59 081 124 .059 
h 61 213 23 .054 
i 44 -093 24 .050 
j 57 .089 26 071 
k 82 210 25 .075 
1 46 .103 23 .057 
m 71 117 27 064 
n 55 .093 27 072 
o 44 .124 24 .054 
p 65 .114 23 .047 
q 73 -094 24 .047 
r 53 116 25 .058 
s 56 .127 20 .037 
t 55 -109 35 .067 
u 62 .155 28 063 
v 51 132 27 .084 
w 72 188 28 077 
x 55 .144 32 054 
y 66 ALL 27 +125 
z 61 104 30 104 
1 27 136 24 .036 
2 49 107 24 067 
3 50 .089 21 045 
4 49 -085 29 .081 
5 53 110 27 .057 
6 42 .087 24 .052 
7 42 -087 22 .066 
8 50 074 22 -071 
9 48 -091 20 041 
Results 


The results of this study will be discussed 
in relation to the following general topics: 
(a) differences in the duration of the ma- 
nipulation and travel movements in writing 
numbers and letters of the alphabet, (b) the 
nature of individual differences in the ma- 
nipulation and travel movements. 


Karl U. Smith and Richard Bloom 


tive movements in writing common numbers 
and the letters of the alphabet. The simpler 
letters, such as i, c, and e, show shorter ma- 
nipulation times, whereas more complex let- 
ters such as &, m, q, and w, are of relatively 
long duration. There is little evidence in the 
data to indicate that letters occurring infre- 
quently in writing take a longer time to write 
than those occurring frequently. 

In contrast to the manipulative movements, 
the travel movements between single letters 
in writing remain fairly constant in duration, 
rarely exceeding .3 sec. 

Table 2 presents summaries of the analysis 
of variance of the measurements of manipula- 
tion and travel movements related to differ-» 
ent single numbers and letters of the alpha- 
bet. As shown in this table, a significant 
difference between the characters occurs for 
manipulation, but not for travel movements. 
The F value for characters in the case of 
manipulation is significant at the 1% level. 
The F for characters in the case of the travel 
movements is not significant. Under the con- 
ditions of this study there were no significant 
differences in the travel movement scores. 

The individual differences among the 10 
Ss of this study are summarized in Table 3. 
This table is presented primarily to indi- 


Table 2 


Summaries of Analysis of Variance 


Manipulation (Contact) 


Table 1 summarizes the means and their 
standard deviations of the durations of the 
manipulation (contact) and travel movements 
for the 35 characters written in this experi- 
ment. It is evident that there are marked 
differences in the duration of the manipula- 


Sum of Mean 
Source Squares df Square F 
Characters 4.13 34 .121 201 
Ss 3.06 9 -340 56.6** 
Residual 2.00 306 .006 
Travel 

Characters .355 34 .0104 
Ss Al4 9 0460 1,74 
Residual 8.082 306 0264 


** Significant at the 1% level. 


Motion Study of Writing 


Table 3 


Characteristics of Individual Differences in 
Writing Letters of the Alphabet 


ih Manipulation Travel M/T M-T 

ub- 

ject Mean o Mean ø Ratio Correlation 
1 .640 .191 .269 .085 2.38 +.12 
2 566 154 .280 .077 2.02 -00 
3 413 .105 .243 070 -1.70 +.06 
4 462 136 © .209 .050 2.22 —.31* 
5 478 120 .279 098 1.71 +.57* 
6 553. SEAL 315 .066 1.75 +.13 
7 +581 119.230 056 2.52 —.21 
8 423.091 .208 .045 2.03 —.09 
9 .641 .149 .263 .053 2.43 —.07 
10 .698 .107 +212 .038 3.29 +.11 


* Significant at the 5% level. 
** Significant at the i4 level, 


cate certain important features of the motion 
analysis of handwriting and in particular the 
nature of special measures of individual dif- 
ferences in motor coordination that it is pos- 
sible to make with the present methods. 
The second and third columns of Table 3 
summarize the means and standard deviations 
of the duration of manipulation movements 
for each of the 10 Ss in writing 35 letters and 
numbers. The third and fourth columns give 
the equivalent measures for travel move- 
ments. Measures of manipulation show a 
wider range of variation than do the 10 in- 
dividual travel measures. The individual 
standard deviations for manipulation are 
roughly double those for travel movements. 
Special indices of motor coordination may 
be derived from measures of the handwriting 
task. One of these special indices is the ratio 
between the duration of manipulation move- 
ments and the duration of travel movements 
in the different writing performances. We 
refer to this index as the M-T handwriting 
ratio (see Table 3). Such a ratio is a gen- 
eral measure of relative timing and rhythm 
in the psychomotor performance. Continuing 
investigations suggest that this index will be 
of value in studying the effects of various psy- 
chological variables on the handwriting task. 
Another general index of motor coordina- 
tion in handwriting is the manipulation-travel 
correlation value. In the present study such 


305 


an index of individual differences is derived 
from correlating the measures of manipula- 
tion and travel for the 35 numbers and let- 
ters for each S. The correlation value ob- 
tained refers to the degree of relation be- 
tween manipulation and travel for a given 
single individual. Although chance may be 
operating to produce the results, this corre- 
lation value varies sharply for different indi- 
viduals. For one S it is significantly nega- 
tive and for another, significantly positive, 
but for the other Ss used it is not significant 
from zero. 


Summary and Discussion 


The general study of handwriting as a psy- 
chomotor skill has never been investigated 
adequately. This research applies, appar- 
ently for the first time, precise methods of 
motion analysis to the investigation of writ- 
ing skill. Specifically, an Electronic Hand- 
writing Analyzer is described that permits 
separate and automatic measurement of the 
component movements of manipulation and 
travel in the writing task. Results bearing 
on the application of this electronic hand- 
writing analyzer to the measurement of writ- 
ing common numbers and letters of the al- 
phabet are summarized. 

In preliminary results obtained on 10 Ss 
on the writing of single numbers and script 
letters, it was found that the manipulation 
(contact) movement varies significantly in 
duration in writing different letters and num- 
bers. The travel movements associated with 
writing these same letters showed much less 
variation both in relation to the letters and 
numbers written and in relation to individu- 
als, It is very evident that the Electronic 
Handwriting Analyzer provides an effective 
device for measuring the duration of the com- 
ponent movements in writing under almost 
any conditions. 

Tnasmuch as the handwriting analyzer pro- 
vides separate measures of manipulation and 
travel in writing performance, it is possible 
to obtain relational measures of motor co- 
ordination with this instrument. One such 
measure is the manipulation-travel ratio (or 
M/T ratio), an index obtained by dividing 
the manipulation time in a given individual 


306 


or condition by the duration of the associated 
travel movements. Another measure of motor 
coordination is the manipulation-travel cor- 
relation, a value representing the relation be- 
tween the duration of manipulation and 
travel movements. Both of these special 
measures of motor coordination provide in- 
teresting new quantitative expressions of vari- 
ables to be studied in the handwriting task. 
Because of the almost universal nature of 
writing as a psychomotor skill among both 
young and old, precise measurements of writ- 
ing are possibly of very broad significance in 
the general analysis of the motor coordina- 
tion in relation to growth, aging, learning, 
and other psychological factors. The Elec- 
tronic Handwriting Analyzer provides the in- 
strumentation essential for such research. 
The device could also be used diagnostically 
in the educational, medical, and industrial 
measurement of motor performance. 


Received December 5, 1955. 


Karl U. Smith and Richard Bloom 


References 


1. Greene, D., Hecker, D., & Smith, K. U. Dimen- 


sional analysis of motion. X. Experimental 
analysis of a time-study problem. J. appl. 
Psychol., 1956, 40, 220-227. 

. Harris, T. L., & Rarik, G. L. Pressure patterns 
in handwriting. Bull. Sch. Educ., University 
of Wisconsin, 1955. Pp. 55. 

. Luthe, W. An apparatus for the analytical study 
of handwriting movements. Canad. J. Psy- 
chol., 1953, 7, 133-139. 

. Pascal, G. R. Handwriting pressure: its measure- 
ments and significance. Charact. & Pers, 
1943, 9, 234-254. 

. Roman, K. G. Studies in the variability of hand- 
writing. J. genet. Psychol., 1936, 49, 139- 
160. 

. Smader, R., & Smith, K. U. Dimensional analy- 
sis of motion. VI. The component move- 
ments of assembly motions. J. appl. Psy- 
chol., 1953, 37, 308-314. 

. Smith, K. U., & Wehrkamp, R. A universal mo- 
tion analyzer applied to psychomotor per- 
formance. Science, 1951, 113, 242-244. 

. Wenger, M. A. An apparatus for the measure- 
ment of muscular tension during handwriting. 
Amer. J. Psychol., 1948, 61, 259-261 


nie eee 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Speed and Accuracy of Scale Reading as a Function of the 
Number of Reference Markers 


Charles A. Baker and James M. Vanderplas * 
Aero Medical Laboratory, Wright Air Development Center 


Scale reading on a polar coordinate dis- 
play usually involves two processes. One 
process involves identifying the value of the 
scale marker nearest the target or “pip.” 
The other process involves estimating the po- 
sition of the target between scale markers. 
This paper is concerned with both these proc- 
esses as they relate to the speed and accuracy 
of reading target position on a polar coordi- 
nate display. 

Leyzorek (9) and Baker (1) have studied 
interpolation accuracy as a function of the 
distance between scale rings on polar coordi- 
nate displays. Leyzorek found that the av- 
erage error of interpolation was 4% of the 
interval for scale ring separations of .5 in. or 
greater. For scale ring separations of less 
than .5 in. the interpolation error increased. 
Leyzorek’s finding of the .5-in. “critical in- 
terval” agrees generally with the findings of 
Grether and Williams (7) and Carr and 
Garner (3) who varied the separation dis- 
tance between reference marks similar to 
those found on dials. Baker, however, found 
that the error of interpolation was constant 
at 4% of the interval over the entire range 
of scale-ring separation distances studied (.25 
to 4 in.) and for viewing distances up to 40 
in. Kappauf and Smith (8) report the “criti- 
cal interval” distance for graduation marks 
on dials to be .25 in. 

On the basis of these studies one might 
recommend the use of many scale rings in 
order to achieve the highest degree of ac- 
curacy, in that an increase in the number of 
scale rings on a display with a given range 
would mean that the distance represented 
between adjacent scale rings would be pro- 
portionately less. Therefore, the average in- 
terpolation error of 4% of the scale-ring in- 
terval found in the above mentioned studies 
would represent a smaller absolute error. 


1J. M. V. is now at Washington University, St. 
Louis, Mo. 


However, it is reasonable to suspect that as 
the number of scale rings is increased there 
would be an increase in the number of errors 
resulting from the misidentification of the 
scale rings (gross errors) as well as an in- 
crease in the total amount of time required 
to determine the location of the target. 
Garner et al. (5) found that time scores in- 
creased as the number of scale rings increased 
from four to eight. In their study the Ss 
were required to report only which ring was 
marked with an “X.” No interpolation was 
required. Saltzman and Garner (10) noted 
an increase in time scores as a function of 
an increase in the number of polar coordinate 
scale rings when the S’s task was merely to 
count the total number of rings presented. 
Although the number of scale rings was not 
systematically varied, Green and Anderson 
(6) found that the polar coordinate display 
containing the most detail required more time 
than less complex displays. It is the purpose 
of this paper to report the precise nature of 
these functions where the Ss were required to 
estimate target position over a wide range of 
experimental conditions. Also, it was ex- 
pected that increasing the size of the display 
would improve speed and accuracy on those 
displays with many scale rings, since the in- 
crease in display size would increase the 
separation between the scale rings, rendering 
them easier to count and perhaps permitting 
more precise interpolation. 


Method 


Apparatus. The stimulus materials consisted of 
3} X 4-in. transparent slides projected from the rear 
onto a translucent viewing screen. The viewing 
screen was constructed with K & E No, 198 MX 
crystalline tracing paper sandwiched between two 
sheets of 3-in. Plexiglas. This arrangement gives a 
fairly realistic simulation of a PPI display, save for 
the lack of temporal and spatial variation in lumi- 
nance, On each slide was a circle 2 in, in diameter. 
Within this circle were either 0, 2, 4, 9, 19, or 39 
equally spaced concentric circles. Every fifth ring 


307 


308 


Fic. 1. A diagram of one of the polar coordinate 
displays showing the scale rings and targets. Note 
that every fifth ring is wider to assist in identifica- 
tion. Each scale ring represents a distance of 1,000 
yards. The contrast relations are reversed, i.e., the 
rings and targets in the experiment were bright 
traces on a dark background. 


was coded by making the ring twice the thickness 
of the other scale rings. One example of these dis- 
plays is shown in Fig. 1. Each of the six different 
numbers of scale rings used appeared on 10 slides, 
making a total of 60 slides in all. A zero-range ref- 
erence dot appeared in the center of each display. 
The scale rings, when projected on the viewing 
screen, were 02 in. wide on the 5-in. display and 
were proportionately wider for the 7, 9, and 11-in. 
display; in each case the rings were light against a 
dark background. Five targets, each consisting of 
an arc .04 in. wide and .25 in. long (on the 5-in. 
display), appeared on each of the 60 slides. 

The 300 targets presented were distributed equally 
in each quadrant of the display and from the center 
of the display to the outer scale ring. During the 
experiment the S’s head movements were minimized 
by having him place his head against a location on 
a wall directly behind his chair. The viewing screen 
was located 22 in. from S’s eyes, and the slide pro- 
jector was mounted behind the screen so that it 
could be adjusted to give the optimal image of the 
display on the screen. The display sizes used in 
the experiment were 5, 7, 9, and 11 in. in diameter. 
These sizes were obtained by varying the distance 
between the projector and the screen. The plane of 
the viewing screen was normal both to the optical 
axis of the projector and also the S’s line of sight. 
The average luminance of the scale rings and targets 
was 30 mL. and background 2 mL. 

Experimental design. The total of 60 slides was 
divided into six groups of 10 slides each. Each of 
the 10 slides within a group contained the same num- 


Charles A. Baker and James M. Vanderplas 


ber of scale rings, ie. each slide contained either 1, 
3, 5, 10, 20, or 40 scale rings (including the outer 
circle described above). Each S was tested over 
four sessions, In any one session S read all six 
groups of slides (60 in all), presented at one of the 
four display sizes used. At three subsequent ses- 
sions S read the same slides at each of the three re- 
maining sizes. The order of presentation of the six 
groups of slides read at any one session and the 
order of presentation of the four display sizes were 
balanced across the group of 12 Ss by a latin-square 
design. Each display size and each group of slides 
appeared equally often in each order position and 
were thus balanced over the entire group of Ss. The 
order of presentation of the 10 slides in a given 
group of slides was random, Sources of bias such 
as practice effects and differential difficulty of stimu- 
lus slides were thus balanced over the groups, as 
were order effects involving number of scale rings. 


Subjects. The Ss were 12 men and women stu- 
dents at Antioch College. None had obvious visual 
defects. 


Procedure. A range value of 1,000 yards was as- 
signed to each scale ring, regardless of the number 
of scale rings on the display. Therefore the dis- 
play with one scale ring had a range of 1,000 yards 
and the display with 40 scale rings had a range of 
40,000 yards. In the operational situation a polar 
coordinate display is designed to represent a cer- 
tain fixed maximum range, and an increase in the 
number of scale rings does not increase this range, 
but merely divides this total range into smaller in- 
tervals. Since the numerical value assigned to the 
scale-ring interval is known to affect the accuracy 
of interpolation (4), the 1,000-yard value was as- 
signed to each scale-ring interval in order to avoid 
confounding the number of scale rings used with 
the value of the scale-ring interval. Subjects were 
instructed as follows: “We are going to show you a 
number of slides representing a series of radar scopes. 
The distance between each scale ring represents 1,000 
yards. Your task is to estimate the distance of the 
target from this center dot which represents your 
position. Therefore, this scope with 10 range rings 
has a total range of 10,000 yards. Note that every 
fifth ring is thicker to help you identify them cor- 
rectly. Give your estimate to the nearest 10 yards 
of range. Be as accurate as you can but proceed 
rapidly since you will be timed. Start at the 12 
o'clock position and call out the ranges in clock- 
wise order.’ Each S had fifteen minutes of practice 
with the various displays before the experiment 
began. 


Results 


The data were recorded in yards of range 
as reported by the Ss. These data were then 
converted into error scores. The error scores 
were obtained by measuring the magnitude 
and direction of the difference between the 
reported range and the actual measured range 
for each target. The actual measured range 


Speed and Accuracy of Scale Reading 


was determined to the nearest 10 yards with 
the use of a machinists’ etching microscope 
by two Es independently. Redeterminations 
of discrepancies were made until each meas- 
ured range was identical for both Es. The 


error score in yards was then expressed as a - 


percentage of the scale interval and as a per- 
centage of the total range of the display. In 
‘addition, each error score was classified as a 
gross error or as an interpolation error, or 
both, depending upon the nature and magni- 
tude of the error. A gross error is defined as 
any reported range of a target that is not in- 
cluded in the range values between the scale 
rings containing that target. Actually, the 
preponderance of gross errors were those as- 
sociated with reported target ranges that were 
approximately 1,000 yards in error. Less 
than 5% of the gross errors were greater 
than a single scale ring interval. All other 
errors were called interpolation errors. Thus 
a reported range of 2,370 for a target whose 
measured range was 1,480 would be called 
both a gross error (1,000 yards) and an in- 
terpolation error (110 yards). Measures of 
the average time required to report the range 
of each target on a slide were also obtained 


309 


by recording to the nearest second the time 
taken to report all five ranges on a given slide 
and dividing this figure by five. 

In Fig. 2 is presented a combined plot of 
average error, expressed as a percentage of 
the total range (interpolation error), percent- 
age of gross errors, and mean time per target 
required to report range, each variable being 
plotted as a function of the number of range 
rings. These data are shown for all scope 
sizes combined. It can be seen from an ex- 
amination of Fig. 2 that interpolation error 
is a decreasing function of the number of 
range rings, while the frequency of gross 
errors and time are both increasing functions 
of the number of range rings. It should be 
noted that although the frequency of gross 
errors increases with an increase in the num- 
ber of scale rings, the magnitude of these 
gross errors, expressed as a percentage of 
total range, decreases. More than 95% of 
the gross errors are of the order of magnitude 
of one scale ring interval. A gross error of 
one interval on a 40-ring display represents 
2.5% of the total range, whereas a gross 
error on a display with five scale rings repre- 
sents 20% of the total range. Thus, an in- 


BE 


6t 


ERRORS- % 


TIME PER TARGET 


20 


Š 
5 È 
7 g 
Z GROSS ERRORS (% OF READINGS) 8 
g 
5 
4 
= 
R 
POLATION ERRORS 
(% OF TOTAL RANGE) 
a anaa SANA AEAN 4 


40 


NUMBER OF RANGE RINGS 


Fic. 2. z 
read target position as a functio! 


sizes combined. Each point on the graj 


Gross errors, average interpolation error, ant e 
i n of number of scale rings, for all display 


and time required to 


ph consists of 2,400 observations. 


310 


AVERAGE ERROR IN % OF INTERVAL 


D s 5" 10" 
DISTANCE BETWEEN RANGE RINGS 

Fic. 3. Average interpolation error as a function 
of the distance between scale rings. The distance 
between scale rings is determined by the number of 
scale rings on the display and the display size. Since 
four scope sizes and six scope types were used, 24 
different scale-ring separations result. Note the break 
in the function at about 4-in. separation. 


crease in the number of scale rings results in 
an essentially proportional decrease in the 
magnitude of gross errors. 

Figure 3 is a graph of interpolation error 
as a function of the distance between scale 
rings. The distance between the scale rings 
is determined by both the number of rings 
and the display size. Since four sizes of dis- 
play and six different numbers of scale rings 
per display were used, 24 actual distances be- 
tween markers result. These data resemble 
those reported by Leyzorek (9) who found 
the “critical interval” to fall at .5-in. sepa- 
ration for polar coordinate displays. It should 
be noted, however, that the absolute error 
magnitude continues to decrease with de- 
creasing interval sizes below .5 in. 

Constant errors. Figure 4 contains a plot 
of the constant errors of interpolation as a 
function of the location of the target between 
scale rings. The constant error is the dif- 
ference between the mean of all the readings 
for each target and the measured position of 
that target. Positive constant errors are as- 
sociated with overestimation of target posi- 
tion, while negative constant errors are as- 
sociated with underestimation. Examination 
of Fig. 4 shows that the nature of the con- 
stant error is affected by the number of scale 


Charles A. Baker and James M. Vanderplas 


rings. When 10, 20, or 40 scale rings are 
used, the constant errors tend to be negative 
for target positions near the inner scale ring 
and then become large and positive for target 
positions just past the mid-point of the scale 
ring interval, diminishing near the outer scale 
ring. The constant error functions for the 
10, 20, and 40 scale-ring displays were very 
similar and for simplicity of graphing they 
were averaged and plotted together. The 
constant error functions for 1, 3, and 5 scale- 
ring displays were also very similar, and 
again the average curve is plotted. However, 
this curve is quite different from the scale- 
ring constant error function for the 10, 20, 
and 40 scale ring displays. The errors asso- 
ciated with the displays with fewer rings are 
always positive and show a peak for target 
positions midway between the scale-ring in- 
terval. With an increase in the number of 
scale rings there is a proportional decrease in 
the distance between scale rings. The con- 
stant error function for the 10, 20, and 40 
scale-ring displays is quite similar to that 
which Carr and Garner (3) found for refer- 
ence markers which were closely spaced. The 
constant error function for the 1, 3, and 5 
scale-ring displays, however, does not con- 
form with other studies (2, 3, 4) in which 
similar scale-interval distances were employed. 


10, 20 & 40 ~N 
RANGE RINGS! N 


L385 
RANGE RINGSŅ 


CONSTANT ERROR-% OF INTERVAL 


5 I5 25 35 45 55 65 75 85 95 
% DISTANCE FROM INNER RANGE RINGS 


Fic. 4. Constant error as a function of the posi- 
tion of the target between adjacent scale rings. The 
solid line is the average curve for displays with 1, 
3, and 5 scale rings. The broken line is the average 
curve for displays with 10, 20, and 40 scale rings. 
There are 720 observations for each point. 


Speed and Accuracy of Scale Reading 


All three of these studies are in agreement 
with respect to the constant error function in 
that the targets were judged to be too close 
to the mid-point of the interval. Therefore, 
there are positive constant errors for target 
positions below the mid-point and negative 
constant errors above the mid-point. The 
reasons that the present data for the 1, 3, 
and 5 scale-ring displays do not conform with 
the data from these other papers are not 
known. 

Display size. Display size had only a small 
effect upon interpolation accuracy. For all 
scopes averaged, the average error of inter- 
polation decreased linearly from 6% of the 
interval in error to 5.2% of the interval in 
error, Gross errors decreased markedly in 
a linear fashion from the smallest display 
(3.4% of the readings in error) to the largest 
display (1.7% of the readings in error). 
Mean reading time did not vary systemati- 
cally with display size. 

Practice. Throughout the four sessions 
there is continued improvement in both speed 
and accuracy. Average interpolation error 
decreased linearly from 6.2% of the interval 
in error for the first session to 4.8% for the 
last (fourth) session. Gross errors decreased 
in an approximately linear fashion from 3.4% 
for the first session to 1.5% for the last ses- 
sion. Average time per target decreased from 
5.7 sec. to 4.2 sec. from the first to the last 
sessions respectively. The curve was nega- 
tively accelerated, but an asymptote was not 
yet reached. It is likely that with continued 
practice the error and time measures would 
continue to decrease. It may thus be as- 
sumed that the values shown in the previous 
figures would be somewhat lower, in general, 
for more experienced Ss than for those Ss 
used in the present study. 


Summary 


The study was designed to investigate the 
speed and accuracy of determining target po- 
sition on a polar coordinate display as a func- 
tion of the number of scale rings. Polar 
coordinate displays of 5, 7, 9, OF 11 in. in 
diameter with 1, 3, 5, 10, 20, or 40 scale 
rings were used. 

Error of interpolation (in percentage of the 
total range of the display) decreased as a 
function of the number of scale rings used. 


311 


The frequency of gross errors (misidentifica- 
tion of scale rings) and the time required to 
make readings increased as a function of the 
number of scale rings. Increasing display 
size improved interpolation accuracy slightly 
and decreased the frequency of gross errors 
markedly. 

Constant errors of interpolation were found 
to be a function of the position of the target 
between scale rings and also a function of the 
number of scale rings used. 

An analysis of practice effects reveals that 
the Ss continued to improve in both speed 
and accuracy throughout the experiment. 


Received December 12, 1955. 


References 


Baker, C. A. Interpolation accuracy as a func- 
tion of the visual angle between scale marks. 
J. exp. Psychol., 1954, 47, 433-436. 

. Barber, J. L., & Garner, W. R. The effect of 
scale numbering on scale-reading accuracy and 
speed. J. exp. Psychol., 1951, 41, 298-309. 

3. Carr, W. J., & Garner, W. R. The maximum 
precision of reading fine scales. J. Psychol, 
1952, 34, 85-94. 

Chapanis, A., & Leyzorek, M. Accuracy of visual 
interpolation between scale markers as a func- 
tion of the number assigned to the scale in- 
terval. J. exp. Psychol, 1950, 40, 655-667. 

5. Garner, W. R., Saltzman, D., & Saltzman, I. J. 

Some design factors affecting the speed of 
identification of range rings on polar coordi- 
nate displays. Systems Research, ONR, SDC 
Rep., 1954 (Tech. Rep. SDC 166-I-95). 

6. Green, B. F., & Anderson, L. K. Speed and ac- 
curacy of reading polar coordinates on a hori- 
zontal plotting table. J. appl. Psychol, 1955, 
39, 227-236. 

Grether, W. F., & Williams, A. C, Jr. Psycho- 
logical factors in instrument reading: II. The 
accuracy of pointer position interpolation as 
a function of the distance between scale marks 
and illumination. J. appl. Psychol., 1949, 33, 
594-604. 

Kappauf, W. E. Jr, & Smith, W.M. A pre- 
liminary experiment on the effect of dial 
gradation and dial size on the speed and ac- 
curacy of dial reading. Ann. N. Y. Acad. 
Sci, 1951, 51, 1272-1277. ; 

Leyzorek, M. Accuracy of visual interpolation 
between circular scale markers as a function 
of the separation between markers. J, exp. 
Psychol., 1949, 39, 270-279. 

10. Saltzman, I. J, & Garner, W. R. The effect of 

size and brightness on the speed of identify- 

ing number of range rings. Systems Research, 

ONR, SDC Rep., 1949 (Tech. Rep. SDC 166- 


79). 


e 


Y 


> 


~ 


9 


s 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Interpersonal Perceptions of Open-Hearth Foremen 
and Steel Production 


Walter A. Cleven and Fred E. Fiedler * 


University of Illinois 


This investigation is one of a series of 
studies on the relationship of interpersonal 
perception and group effectiveness (2, 3, 4). 
It was designed to test an hypothesis which 
grew out of earlier studies on basketball and 
surveying teams (2), and military aircraft 
and tank crews (3). The present research 
was conducted in open-hearth shops of a 
large steel company in which the personnel, 
in contrast to subjects (Ss) participating in 
these earlier studies, is highly stable over 
time, and where carefully maintained produc- 
tion records are available. 

Interpersonal perception is measured here 
by means of the score, Assumed Similarity 
between Opposites, or ASo. This score re- 
flects the extent to which the subject (S) 
predicts different responses for the man with 
whom he can work best, and the man with 
whom he can work least well. We interpret 
ASo as a measure of the psychological dis- 
tance which § perceives between himself and 
his co-workers. Supervisors who predict simi- 
lar responses for their best- and least-liked 
co-workers (high ASo) are, by this inter- 
pretation of ASo, more accepting, approach- 
able individuals, while supervisors who pre- 
dict less similar responses for these workers 
(low ASo) are presumably more critical and 
analytic in their work relationships. The 
primary hypothesis to which these earlier 
group studies have led is that more produc- 


1We are indebted to Messrs, J. H. Vohr, P. E. 
Thomas, E. C. Sorrells, H. W. Erler, and G. H. 
Warnock, of the Gary Steel Works; and to Mr. Dan 
Farrell and Mr. E. W. Kempton, of the United States 
Steel Corporation, for their cooperation and support 
of this study. We are also indebted to Drs. L. J. 
Cronbach, Eleanor P. Godfrey, Ross Stagner, and 
C. F. Wrigley for their contributions to the design 
and administration of the study, and to Mrs. Betty 
F. Mannheim who assisted with the analysis of the 
data. 

The study was conducted under Contract N6-ori- 
07135 between the Office of Naval Research and the 
University of Illinois, with F. E. Fiedler as principal 
investigator. 


tive groups have leaders who tend to differ- 
entiate more in perceptions of their most and 
least preferred co-workers (i.e., have lower 
ASo) than do leaders of less effective groups. 

The earlier reports in this series of studies 
have suggested some possible restrictions or 
special cases of this general hypothesis. In 
particular, three points have been noted, 
which we shall mention briefly. The data 
suggest (a) that the hypothesis holds espe- 
cially in the case of groups which accept the 
leader. This acceptance is defined in terms 
of the number of sociometric choices the 
leader receives from his group (3). (b) The 
sociometric likes and dislikes of the leader 
for certain key personnel in the group may 
be important in determining the direction of 
relationship between ASo of the leader and 
the effectiveness of the group. It was noted 
that effective groups may have either low 
ASo leaders who sociometrically prefer key 
subordinates, or high ASo leaders who do not 
sociometrically choose their key subordinates. 
(c) It has also been suggested that the hy- 
pothesis may be valid only for tasks which 
require “direction-giving leadership behavior” 
(3). More effective groups engaged in tasks 
which require receptive leadership behavior 
for effective group coordination may have 
leaders with high ASo, regardless of the lead- 
er’s preferences for his key subordinates. 
We have not investigated the restrictions 
based on sociometric choice in the present 
paper, in part because of our limited sample 
size, and in part because of the possibility 
that sociometric choices become relatively 
less important in long-lived groups. Men 
who at first may find it difficult to get along 
together will, in the course of two, five, or 
ten years either learn to do so or else leave 
the group by transferring to another crew, Or 
to another job, It seems reasonable to as- 
sume that men who have worked together for 
several years are rather adequately adjusted 


312 


Interpersonal Perceptions 


to each other, an assumption which would 
not be warranted in relatively short-lived 
groups such as military units. 


Procedure 


Sample. Management personnel in four open- 
hearth shops of a large steel company participated 
in the study. These four shops are engaged in simi- 
lar operations, although equipment varies somewhat 
from shop to shop. 

Each shop is operated on a 24-hour, 7-day-week 
basis with each shift or “turn” working eight hours. 
Every turn has a full complement of first- and sec- 
ond-line supervisors and their crews. Since one turn 
is off duty in any one 24-hour period, each shop re- 
quires four turns. A total of 16 turns thus consti- 
tutes our sample.? 

Four supervisors are in charge of each turn: one 
General Foreman, one Stock Foreman, one Pit Fore- 
man, and one Senior Melter. The General Foreman, 
along with the Stock and Pit Foreman, directs the 
supporting operations of raw material assembly and 
final steel pouring. The Senior Melter is in charge 
of steel manufacture. Depending on the number 
and size of furnaces in the shop, the Senior Melter 
supervises one or two Junior Melters and their 
crews. In three of the four shops (or 12 of the 16 
turns), the Senior Melter has two Junior Melters 
reporting to him. 

Test instrument. Each available foreman and 
melter was requested to predict the responses of two 
persons he had known: (a) the man with whom he 
could work best, and (b) the man with whom he 
could work least well. (These ratees could be any- 
one with whom S had ever worked; S was not asked 
to specify their names.) If the two predictions by 
a single S are quite different, he is said to have low 
ASo; conversely, if the two predictions are quite 
similar, he is said to have high ASo. The test con- 
sisted of 40 statements such as: “I tend to join 
many organizations,” “I am often bored with peo- 
ple,” and “I am generally regarded as optimistic.” 
Each item was answered on a six-point scale rang- 
ing from “definitely true” to “definitely untrue.” 
The similarity of these two predictions, computed 
by the statistic D (1, 5), yields the index, “As- 
sumed Similarity between Opposites” (ASo). 


Criterion 


The index of group effectiveness is based on the 
time elapsed from one “tap” (pouring of molten 
metal from the furnace) to the next tap on a par- 
ticular furnace. For economic reasons, company 
officials regard short “tap-to-tap time” as the most 
important production goal. The primary impor- 
tance of this production goal is recognized and ac- 
cepted by the foremen as well as their subordinates. 


° All four turns within a shop are under, the direc- 
tion of a single shop superintendent and his assistant 
superintendent. These men were not tested. 


313 


The average tap-to-tap time is about 10 hours; 
two turns are, therefore, involved in preparing each 
batch of steel, or “heat,” for tapping. However, the 
tap-to-tap time scores are uniformly assigned to the 
Senior Melter in charge of the furnace at the time 
the tap is made, regardless of the length of time the 
shift has actually worked on the heat, This seems 
justifiable because the last hours of the heat are re- 
garded as more critical in the manufacturing process 
than the first few hours, In addition, randomiza- 
tion takes place because the turns do not systemati- 
cally follow one another or use the same furnaces 
in the shop. 

Using an analysis of variance of ranked data, we 
found significant differences between shops in tap- 
to-tap time. Since these differences can be attributed 
to different furnace capacities in the four shops, the 
tap-to-tap time data were standardized within shops 
by means of T scores. This procedure is designed 
to eliminate variance due to differences in equip- 
ment and to retain the variance which may be at- 
tributable to leadership variables. 

Criterion reliability. The reliability of tap-to-tap 
scores was based on an anlysis of over 25,000 heats 
based on the 3- to 16-months period preceding test- 
ing. We excluded the summer months on recom- 
mendation of company officials because of extensive 
personnel shifts due to vacation schedules, An even- 
month vs. odd-month split-half procedure was em- 
ployed. The estimated reliability of tap-to-tap time 
over the 16 turns is .82. In order to minimize the 
effects of long-range changes, €g in personnel or 
company policy, the criterion scores used below are 
based on only a part of these data, namely the 3- to 
10-month period immediately preceding testing. 


Results 


Table 1 presents the correlations between 
the average turn tap-to-tap time and the ASo 
(Assumed Similarity between Opposites) of 
the General, Pit and Stock Foremen, and 
Senior Melters. As the table shows, the cor- 
relation between average turn tap-to-tap time 
and ASo is significant in the predicted direc- 
tion in the case of Senior Melters and Pit 


Table 1 


Correlations (Rho) Between ASo of Various Super- 
visors and Average Turn Tap-to-Tap Time 


Supervisor ASo N* Rho p 
General Foreman 15 —.13 z= 
Stock Foreman 15 —.42 ad 
Pit Foreman 14 —.12 <.01 
Senior Melter 15 —.54 <.05 
Supervisor average 16 —.1 <.01 


+ N varies due to missing data. 


314 


Foremen. The correlation falls short of an 
acceptable significance level for Stock Fore- 
men, and is negligible for General Foremen. 
In addition, the average ASo of the foremen 
and Senior Melter on each turn is signifi- 
cantly related to average turn tap-to-tap time. 


Discussion 


Our hypothesis states that more effective 
groups have supervisors with low ASo. The 
significant correlations of Senior Melter and 
Pit Foreman ASo, as well as average super- 
visor ASo, with average turn tap-to-tap time 
support this hypothesis. 

Of particular interest is the high correla- 
tion between mean turn tap-to-tap time and 
ASo of Pit Foremen. On the surface, the 
melters appear to determine tap time, since 
it is their decision as to when the heat is 
ready to be tapped. These results suggest 
that variance in tap-to-tap time may also be 
a function of the Pit Foreman, although, con- 
versely, the ASo of these supervisors may be 
a function of the turn efficiency or of some 
related variable. On the other hand, the 
low correlation in the case of the General 
Foremen may indicate that these men have 
the least influence on turn efficiency, as 
measured by tap-to-tap time, even though the 
limited number of cases in our sample does 
not enable us to reject the hypothesis that all 
differences among the obtained correlations 
are a matter of chance. 

The fact that mean turn tap-to-tap time is 
negatively related to the mean ASo of the 
turn’s four foremen suggests that ASo scores 
within turns may be homogeneous. A ranked 
analysis of variance test shows this to be the 
case. The long-lived nature of these groups 
could cause this covariation of ASo. For ex- 
ample, if more similar Ss are more congenial 
to each other, selective factors may operate 
in the personnel placement process, such that 
similar Ss tend to be assigned to the same 
turn. Alternatively, changes in interpersonal 
perception may occur as a result of group 
processes within the turns. The possibility 
of such changes is suggested by the recent 
findings of Steiner and Dodge (6). 


Walter A. Cleven and Fred E. Fiedler 


Summary and Conclusions 


A study was conducted relating the inter- 
personal perceptions of open-hearth shop 
foremen to the productivity of their work 
units. Interpersonal perception was meas- 
ured by means of Assumed Similarity (ASo) 
tests which reflect how similar or different a 
person describes his most and his least pre- 
ferred work companions. Group effectiveness 
measures were based on output as indicated 
by “tap-to-tap” time, the time required to 
complete a “heat” of steel. This criterion 
measure has considerable stability and is re- 
garded as the most important production in- 
dex by company officials. 

Management personnel of four open-hearth 
shops of a large steel company participated 
in this study. Interpersonal perception (ASo) 
tests were administered to all available Ss. 

Significant relations were found between 
supervisor ASo and the tap-to-tap time index. 
These results are consistent with the hypothe- 
sis that more effective groups have super- 
visors who tend to predict different responses 
for their most- and least-preferred co-workers. 


Received November 25, 1955. 


References 


1. Cronbach, L. J., & Gleser, Goldine C. Assessing 
similarity between profiles. Psychol. Bul, 
1953, 50, 456-473. 

2. Fiedler, F. E. Assumed similarity measures as 
predictors of team effectiveness. J. abnorm. 
soc. Psychol., 1954, 49, 381-387. 

3. Fiedler, F. E. The influence of leader-keyman 
relations on combat crew effectiveness. J. ab- 
norm. soc. Psychol., 1955, 51, 227-235. 

4. Godfrey, Eleanor P., Wrigley, C. F., Mannheim, 
Betty F., Hall, D. M., & Fiedler, F. E. The 
effect of interpersonal relations on the suc- 
cess of consumer cooperatives. Proc. Amer. 
sociol. Soc., 1955. (Abstract) 

5. Osgood, C. E., & Suci, G. A measure of relation 
determined by both mean difference and pro- 
file information. Psychol. Bull, 1952, 4% 
251-262. 

6. Steiner, I. D, & Dodge, Joan S. Interpersonal 
perception and role structure as determinants 
of group and individual efficiency. Hum- 
Relat., in press. 


: 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Encouragement, Anxiety, and Test Performance’ 


Daniel Sinick 


New York Regional Office, Veterans Administration 


The customary practice in individual test- 
ing has been for the examiner to offer the ex- 
aminee encouragement between subtests. In- 
structions accompanying individual mental 
tests uniformly encourage the use of encour- 
agement (20, p. 57; 21, p. 171; 9, p. 6). 
This has been true despite a concurrent em- 
phasis throughout the development of psy- 
chological testing on standardization of test 
conditions. Encouragement has been offered 
testees as perhaps a corollary of the axiom of 
psychological testing that each test should 
measure a testee’s best possible performance. 

Not all writers on tests have given com- 
plete acceptance, however, to the use of en- 
couraging comments. Bingham (3, pp. 226- 
227), Goodenough (6, p. 305), and Terman 
and Merrill (20, p. 57) recommend that en- 
couragement be used with caution and mod- 
eration. Cole (4) contends that encourage- 
ment may interfere with rapport and reduce 
the validity of test results. 

If a testee’s best possible performance and 
valid test results are to be achieved, indi- 
vidual differences between testees should be 
taken into account. Since encouragement may 
have different effects upon different testees, it 
would be of value to know, in advance of in- 
dividual testing, which testees would benefit 
from encouragement and which would not. 
The present experiment used measured anx- 
iety in an effort to differentiate subjects (Ss) 
whose performance encouragement improves 
and those whose performance encouragement 
impairs. A variable which can so differenti- 
ate Ss could function as a barometer to indi- 
cate whether the examiner should blow hot 
or cold. 


Method 


Population sample. The sample consisted of stu- 
dents, mostly freshmen, enrolled in an introductory 
psychology course at a community college offering a 
two-year terminal program. Of the 211 students in 


1Based upon a Ph.D. thesis accepted by New 
York University in 1955. 


the sample, 161 were men and 50 women. Their 
ages ranged from 16 to 27, but 42 per cent were 18, 
To these 211 students assembled in six class sections, 
the investigator administered two anxiety scales for 
the purpose of dividing these students into three 
groups with varying amounts of anxiety. 

Anxiety instruments, The anxiety scales employed 
were the Taylor Manifest Anxiety Scale (19) and 
an anxiety questionnaire developed by Sarason and 
Mandler (16). The former scale tends to measure 
personality anxiety, while the latter is especially 
geared toward anxiety in the testing situation. The 
50 MMPI items selected and revised by Taylor were 
employed in this study without filler items. Mc- 
Creary and Bendig (12) found a correlation of .95 
between the 50-item form and the 225-item form 
containing 175 unscored items. The Sarason-Man- 
dler scale has 35 scored items and four fillers. 

Selection of subjects. Since this experiment was 
concerned with both the anxiety which Ss bring 
with them to the testing situation and the anxiety 
evoked thereby, the two anxiety instruments were 
used jointly to create the requisite groups. It was 
also believed that two such instruments would serve 
as checks against each other to avoid the improper 
placement of some Ss in the three anxiety cate- 
gories. 

Standard scores on the two scales were averaged 
to form a combined distribution, from which were 
selected 28 Ss at the high-anxiety end of the dis- 
tribution (above the 81st percentile), 26 at the low- 
anxiety end (below the 19th percentile), and 30 in 
the middle (44th to 61st percentiles, inclusive). The 
Ss in each anxiety category were later divided into 
two groups equated on the basis of total scores on 
the MacQuarrie Test for Mechanical Ability. 

Experiment proper. About two weeks after the 
high-, low-, and medium-anxiety Ss had been selected, 
each of them was notified, by means of a mimeo- 
graphed 3X 5 card used by the dean’s office, to re- 
port to a designated room at a designated time, al- 
most always a free period. When each S arrived at 
this room, which was furnished with a large table 
and two chairs facing each other, he was told, “Some 
of you have been selected by lot to take a short 
aptitude test. It takes about twenty minutes, Later 
on you will be given the results.” 

The investigator then individually administered to 
each S the MacQuarrie Test for Mechanical Ability. 
This test, actually a battery of seven subtests, was 
selected as providing a series of timed tasks not 
definitely related to any specific aptitude or ability 
and “relatively independent of intelligence in per- 
sons of similar status” (18, p. 267). To avoid the 
possibility of some Ss’ ostensibly not caring whether 


315 


316 


they did well on a test of “mechanical ability,” the 
test was begun with the cover folded back. To as- 
sure some personal involvement, each S’s name was 
written conspicuously on the first open page. The 
test was administered without any comments by the 
investigator. 

On the basis of total scores on the MacQuarrie, 
the Ss in each of the three anxiety categories were 
divided into two groups equated as closely as pos- 
sible with regard to means and standard deviations. 
About six weeks after the first MacQuarrie, each S 
was again notified to report to the designated room, 
this time the 3X5 card bearing the additional 
phrase, “for your aptitude test results.” Upon ar- 
rival, each S was told, “In ten minutes I am going 
to give you the results of the aptitude test you took 
previously. But before we do that, you are to take 
a greatly shortened form of that test.” The investi- 
gator again individually administered the MacQuar- 
rie to each S, with the time allowances reduced suffi- 
ciently to prevent Ss’ completing any subtest, and 
thus ensuring the ensuing distribution’s discrimina- 
tive power. 

Encouragement, the independent variable, was in- 
troduced during the second MacQuarrie, which was 
administered with encouraging comments between 
subtests to the members of one group in each anx- 
iety category and without comments to the mem- 
bers of the other group in each category. The lat- 
ter three groups served as controls. The comments 
used are of the kind suggested by authorities like 
Binet (2, p. 34), Terman (20, p. 57), and Wechsler 
(21, p. 171). Comments were employed flexibly 
and in sufficient variety for adaptation to particular 
situations and to individual Ss, the aim being to 
duplicate as closely as possible the normal use of 
encouragement in individual psychological testing. 

Upon completing the second MacQuarrie, each S 
was given a 3 X 5 card bearing on one side the re- 
sults of the first MacQuarrie in terms of published 
percentile ranks and on the other side a general in- 
terpretation of the results. For interested students, 
the results of both tests were made available in the 
dean’s office in the form of rank-order lists. 


Daniel Sinick 


Results 


Comparisons were made of the performance 
on the second MacQuarrie of the encouraged 
low-anxiety group and the nonencouraged low- 
anxiety group, of the encouraged middle- 
anxiety group and the nonencouraged mid- 
dle-anxiety group, and of the encouraged 
high-anxiety group and the nonencouraged 
high-anxiety group. Two-tailed ¢ tests were 
employed to determine the significance of 
the differences between each pair of groups. 
Table 1 shows the results of these intergroup 
comparisons as regards total score, number of 
errors, and standard deviation. 

Interindividual comparisons of total scores 
were also made for the pairs of Ss matched 
during the equating of groups. These com- 
parisons reveal that eight of the 13 encour- 
aged low-anxiety Ss surpassed their nonen- 
couraged mates, while five were surpassed; 
eight of the 15 nonencouraged middle-anxiety 
Ss surpassed their encouraged mates, six were 
surpassed, and one was equaled; eight of the 
14 nonencouraged high-anxiety Ss surpassed 
their encouraged mates, five were surpassed, 
and one was equaled. Based upon a normal 
expectancy of an equal number of relative 
improvements and impairments of perform- 
ance for the encouraged Ss in each anxiety 
category, £ tests yield a p < .4 for the low- 
anxiety Ss, a p < .7 for the middle-anxiety 
Ss, and a p < .4 for the high-anxiety Ss. 


Discussion 


The one finding of statistical significance 
indicates that encouragement is related to 


Table 1 


Differences Between Encouraged and Nonencouraged Groups on the MacQuarrie Test for Mechanical Ability 


Low-Anxiety Ss Middle-Anxiety Ss High-Anxiety Ss 
(13 encouraged, 13 not) (15 encouraged, 15 not) (14 encouraged, 14 not) 
Factor Differ- Differ- Differ- 
compared ence t $ ence t È ence $ b 
Total score 4.15* 95 <4 -74f 15, 39 4,43* 1.00 <4 
No. of errors 1.00* EEZ <9 167* 40 <7 5.35 145 <2 
SD 9.14t 2.55 <.05 4.47} 1.44 <.2 5.06} 1.19 <3 


* Favors encouraged group. 
+ Favors nonencouraged group. 
t SD greater for encouraged group. 


Encouragement, Anxiety, and Test Performance 


increased variability of performance among 
low-anxiety Ss as selected in this experiment. 
Nonsignificant trends in the same direction 
were found for the medium- and high-anxiety 
Ss. These findings accord with the findings 
of previous experiments on encouragement 
(O 23): 

Other nonsignificant trends point to the pos- 
sibility that low-anxiety Ss perform better un- 
der encouragement. More of these Ss did 
better under encouragement than did worse 
(eight to five). The encouraged group of 
low-anxiety Ss also obtained a better aver- 
age score than the nonencouraged group. 
The direction of these trends accords with 
both experimental findings (8, 14) and theo- 
retical considerations (1, 17). Encourage- 
ment would tend to involve the self-esteem 
of low-anxiety Ss and thereby increase their 
motivation, without producing in many such 
Ss the hampering effects of stirred-up anx- 
iety. Having presumably little anxiety to 
start with, most low-anxiety Ss would not be 
likely to reach the critical point in motiva- 
tion beyond which impairment of perform- 
ance sets in. 

Although very few experiments have in- 
cluded middle-anxiety Ss, theorization leads 
to a similar expectation that encouragement 
would involve the self-esteem of most such 
Ss so as to bring about improved perform- 
ance, their initial level of anxiety being con- 
sidered sufficiently low not to be raised be- 
yond the point of efficient performance. This 
expectation was not borne out by the find- 
ings, which indicate (though at a chance 
level) that the performance of more middle- 
anxiety Ss was impaired than improved un- 
der encouragement (eight to six). 

The effect of encouragement on middle- 
anxiety Ss, it should be emphasized, is of 
major importance. By virtue of their selec- 
tion from the more heavily populated middle 
of the distribution, such Ss represent a larger 
proportion of the total population than do Ss 
selected from either tail of the distribution. 
If encouragement has the effect of producing 


improved performance in some middle-anxiety | 


Ss and impaired performance in others, the 
latter result would reflect an adverse influ- 
ence upon a relatively large number of Ss. 


317 


sev a sean a E 
Studies of anxiety (5, 13, 22) and stress 
(10, 11, 15) indicate that a high level of 
anxiety, whether existent or induced in Ss, 
generally brings about impaired performance, 
but occasionally causes improvement. The 
level, though high, may in some Ss not ex- 
ceed the point of maximum efficiency. An 
increase in errors without a decrease in total 
score has been found frequently. This latter 
finding appears corroborated in the present 
study, although at a nonsignificant level. 
While the encouraged group of high-anxiety 
Ss obtained a better average score than the 
nonencouraged group, it made more errors. 
Corroborative though nonsignificant, too, is 
the present finding that more high-anxiety 
Ss did worse under encouragement than better 
(eight to five). 

The inconclusive findings of this study may 
be due to a weakness in the experimental de- 
sign, for the Ss’ familiarity with the Mac- 
Quarrie test on taking it the second time 
may have reduced their anxiety and, in turn, 
the possible differential effects of encourage- 
ment on their performance. 

The problem approached in this study 
seems worthy of further investigation. If en- 
couragement produces higher scores for some 
testees, to that extent it may add to accuracy 
of measurement, since accuracy of measure- 
ment depends in part upon testees’ perform- 
ing maximally. If encouragement yields 
lower scores for some testees, however, to 
that extent it would seem to detract from ac- 
curacy of measurement. 

Also of importance is the possible effect 
upon those testees whose scores may be low- 
ered as a result of encouragement. If, as 
both experimentation and theorization have 
indicated, encouragement tends to heighten 
self-esteem involvement in the testing situa- 
tion, Ss with unwarrantedly low scores may 
suffer an unwarranted blow to their self- 
esteem. Another danger is that such Ss may 
acquire a distorted notion of that segment of 
themselves which has been measured, Such 
distortion may do damage to the realistic 
self-evaluation considered to be one of the 
goals of counseling and psychotherapy. 

Should further investigation confirm these 
possibilities, caution would seem to be indi- 


318 


cated in the use of encouragement in indi- 
vidual psychological testing. Its blanket use 
might preferably be superseded by its selec- 
tive use, once an adequate criterion for selec- 
tion has been discovered or developed. 


Summary 


To determine the effect of encouragement 
on the individual test performance of Ss with 
varying amounts of anxiety, two anxiety 
scales were first administered to a sample of 
college students. From the combined dis- 
tribution of anxiety scores, three groups of 
Ss were selected and designated low-, me- 
dium-, and high-anxiety. To each S was in- 
dividually administered the MacQuarrie Test 
for Mechanical Ability, no comments being 
made by the examiner. On the basis of 
scores on this test, each anxiety category was 
divided into two equated groups. Six weeks 
later the test was again individually adminis- 
tered to each S, this time encouraging com- 
ments being offered between subtests to one 
group in each category but not to the other. 

Two-tailed ¢ tests revealed only one sig- 
nificant finding: the performance of the low- 
anxiety Ss displayed increased variability un- 
der encouragement. The possible pertinence 
of nonsignificant trends to previous experi- 
mental findings and to theoretical considera- 
tions was discussed. The disadvantage to 
the design of the Ss’ taking the same test 
twice was mentioned. It was suggested that 
further investigation of the problem is mer- 
ited, so that testees whose performance is 
adversely affected by encouragement may 
eventually be detected in advance and tested 
without encouraging comments. 


Received October 13, 1955. 


References 


1, Ausubel, D. P., Schiff, H. M., & Goldman, M. 
Qualitative characteristics in the learning proc- 
ess associated with anxiety. J. abnorm. soc. 
Psychol., 1953, 48, 537-547. 

2. Binet, A., & Vaschide, N. Expériences de force 
musculaire et de fond chez les jeunes garcons. 
Année Psychol., 1897, 4, 15-63. 

3. Bingham, W. V. Aptitudes and aptitude test- 
ing. New York: Harper, 1937. 


Daniel Sinick 


4 


10. 


11. 


12. 


13. 


14. 


i 


16. 


17. 


18. 


19. 


20, 


21. 


22. 


23. 


. Farber, I. E., & Spence, K. W. Complex learn- — 


. Goodenough, Florence L. Mental testing. New 


. Hurlock, Elizabeth B. The value of praise and 


. Kaye, D, Kirschner, P, & Mandler, G. The 


. Kitzinger, Helen, & Blumberg, E. 


Cole, D. Communication and rapport in clini- 
cal testing. J. consult. Psychol, 1953, 17, 
132-134. 


ing and conditioning as a function of anxiety. 
J. exp. Psychol., 1953, 45, 120-125. 


York: Rinehart, 1949. 


reproof as incentives for children. Arch. Psy- 
chol., 1924, No. 71. 


effect of test anxiety on memory span in 
group test situation. J. consult. Psychol, 
1953, 17, 265-266. h 
Supplemen- 

tary guide for administering and scoring 
Wechsler-Bellevue Intelligence Scale (Form I); 
Psychol, Monogr, 1951, 65, No. 2 (Whole 
No. 319). An 
Lazarus, R. S., & Eriksen, C. W. Effects of 
failure stress upon skilled performance. J. 
exp. Psychol, 1952, 43, 100-105. F 
Lucas, J. D. The interactive effects of anxiety, 
failure, and intraserial duplication. Amer, Ja 
Psychol., 1952, 65, 59-66. 5. 
McCreary, Joyce B., & Bendig, A. W. Com= 
parison of two forms of the Manifest Anx- 
iety Scale. J. consult. Psychol., 1954, 18, 206; 
Maltzman, I., Fox, J., & Morrisett, L., Jr. So: 
effects of manifest anxiety on mental set. Jy 
exp. Psychol., 1953, 46, 50-54. 


of verbal performance. J. exp. Psychol, 1953, 
46, 120-124. 

Rosenbaum, G. Stimulus generalization as 
function of level of experimentally indu 
anxiety. J. exp. Psychol., 1953, 45, 35-43. 

Sarason, S. B., & Mandler, G. Some correlates” 
of test anxiety, J. abnorm. soc. Psychol, 
1952, 47, 810-817. 

Sears, R. Motivational factors in aptitude test- 
ing. Amer. J. Orthopsychiat., 1943, 13, 468- 
492. ' 

Super, D. E. Appraising vocational fitnessi 
New York: Harper, 1949. 

Taylor, Janet A, A personality scale of mani- 
fest anxiety. J. abnorm. soc. Psychol, 1953, 
48, 285-290. 

Terman, L. M., & Merrill, Maud A. Measuring 
intelligence. Boston: Houghton Mifflin, 1937. 

Wechsler, D. The measurement of adult intelli- 
gence. (3rd Ed.) Baltimore: Williams & 
Wilkins, 1944. 

Westrope, Martha R. Relations among Ror- 
schach indices, manifest anxiety, and perform- 
ance under stress. J. abnorm. soc. Psycholy 
1953, 48, 515-524. J 

Wild, E. H. Influence of conation on cognition. 
Part II. Brit. J. Psychol, 1928, 18, 332-355- 


The Journal oj Applied Psychology 
Vol. 40, No. 5, 1956 


An Evaluation of Two Approaches to Discipline 
in Industry 


Norman R. F. Maier and Lee E. Danielson 


University of Michigan 


The problem of disciplinary action in in- 
dustry has long plagued supervision at all 
levels, especially at the first line. Although 
there are many approaches to the problem, 
they can be roughly classified into two main 
types: judicial and human relations. The 
judicial approach is characterized by an at- 
tempt to determine the rightness or wrong- 
ness of an employee’s actions in a particular 
situation. If the worker was “wrong,” the 
supervisor metes out the predetermined pun- 
ishment. The emphasis is on the solution to 
the immediate problem rather than on the 
possible consequences of the decision. Get- 
ting the facts, screening out opinions, and 
finally weighing the evidence are important 
steps in the judicial approach. 

The human relations approach is charac- 
terized by an emphasis on problem solving. 
The question of rightness or wrongness of be- 
havior is subordinate to the question of “How 
can I encourage this worker to perform ina 
desirable manner?” As in most problem solv- 
ing, the supervisor’s behavior is characterized 
by flexibility and adaptiveness, with the re- 
sult that a variety of solutions may be fol- 
lowed on different occasions in gaining the 
same objective. 

The study reported uses a role-playing case 
to determine how supervisors behave in a 
situation involving a disciplinary problem. 
Multiple role playing (1, 2) is used in order 
to permit a comparison of outcomes reached 
by different participants under identical test 
conditions. 


Method 
Subjects 


This study was conducted during the Foremen’s 
Conference at the University of Michigan (April, 
1954). Supervisors, representing @ wide variety of 
industries and several levels of management, partici- 
pated. The program was repeated on two successive 
days and data were obtained from over 500 indi- 
viduals. 


Procedure for Role Playing 


After hearing a lecture on the topic of attitudes 
and how to deal more effectively with them, the 
audience was divided into 12 workshops averaging 
approximately 42 men, each being conducted by a 
trained conference leader. In dividing the audience 
into workshop groups, care was taken to see that 
two men from one company would not be in the 
same group. 

When the workshops had convened in their sepa- 
rate rooms, each conference leader informed his 
group that they would participate in a case study 
called the “No Smoking” problem, Since this case 
involved three persons, the men were asked to form 
three-man groups so that many sets of persons 
could study the case independently. A little time 
was then spent to give the men a general idea of 
the role-playing approach to case studies, The con- 
ference leader then read some general instructions 
describing the job and the working conditions, and 
also mentioned the fact that the foreman had just 
laid off a worker for a period of three days for vio- 
lating the company smoking rule, Separate instruc- 
tions were supplied for the part each member of the 
group was to play. These were the foreman, the 
worker, and the union steward. The worker was 
not involved in the role playing, but was present 
only to observe so that he could later pass judg- 
ment on his satisfaction with the outcome. It was 
the steward’s objective to get the foreman to re- 
verse his decision, and it was the interview between 
the foreman and the steward that was role played. 
The situation described in the roles made it clear 
that a violation had occurred, that the worker knew 
he was violating a rule, and that there was a spe- 
cific penalty. However, the worker felt he could 
not afford the layoff, and because the steward re- 
garded the employee to be a conscientious worker 
who sneaks fewer smokes than others, he was willing 
to make an issue out of the incident. After 20 min- 
utes of interaction, role playing was terminated and 
preparations for analysis and discussion were made. 


Procedure for Obtaining Data 


Data on the following points were collected from 
one three-man team at a time: 

1. The solution or decision reached. 

2. The type of interaction between steward and 
foreman. Three classifications were described: (a) 
argument (each person presents his side of the case 
and appears unsympathetic with the other’s side) ; 
(b) problem-solving discussion (each tries to under- 


319 


vau 


stand the situation of the other and both proceed to 
work out a solution to the conflict in their situa- 
tions); (c) intermediate type of interaction. Par- 
ticipants always reached agreement on the classifica- 
tion. 

3. Satisfaction with the solution. Each partici- 
pant reported his feeling about the decision. 

4. Worker’s response to three questions: (a) Were 
you satisfied with the steward’s defense of you? 
(b) Will you vote for this steward at the next un- 
ion election? (c) Will your work suffer because of 
the way your case was handled? 

5. Steward’s intention to file a grievance. 


Results 


The outcomes of the discussions have been 
grouped into three general classifications, as 
follows: (a) no decision, (b) full layoff, and 
(c) adjusted solutions. The “no decision” 
classification includes some incomplete dis- 
cussions but most of them were deadlocks in 
which each refused to give in, but the fore- 
man nevertheless was reluctant to carry out 
the penalty without the steward’s support, 
The “full layoff” classification means that the 
foreman carried out the penalty as pre- 
scribed by the company rule. This was a 
three-day layoff, and since there was no ques- 
tion about the violation the “judicial” ap- 
proach called for this solution. The “ad- 
justed” classification includes all cases in 
which the foreman made some adjustment. 
Solutions in which an agreement is sought 
stem from the “human relations” approach. 
These ranged from reducing the penalty to 
overlooking the incident. Nearly half of the 
adjusted solutions were agreements to reduce 
the three-day layoff to a reprimand or warn- 
ing. In the case of adjusted solutions the 
steward usually agreed to support the no- 
smoking rule in the future. 

The results are shown in Table 1. It is 
important to note that 52%, or better than 
half of the foremen, did not follow the letter 
of the rule, despite the fact that the rule was 
clear and had no provision for leniency. In- 
cluded in this 52% are 9% decisions to con- 
sult higher management in order to get per- 
mission to make an exception or to let higher 
management make the decision, but the other 
43% of the foremen took it upon themselves 
to follow the “human relations” approach. 
A little more than a third (35%) of the fore- 


‘vorman K. F. Mater and Lee B. Dametson 


Table 1 
Satisfaction with Various Solutions 


No 
Decision Full Adjust- 
Reached Layoff ment 


Instances in 172 groups 23 (13%) 60 (35%) 89 (52%) 


Satisfaction with solu- 


tion (%) 
Foreman 17 70 86 
Steward 9 30 80 
Worker —* 15 71 
Interaction (%) 
Argument 30 24 3 
Intermediate 48 39 33 
Problem solving 22 37 64 
Worker’s reaction (%) 
Satisfied with defense 61 52 93 
Will vote for steward 74 47 91 
Will reduce produc- 43 40 6 
tion 
Steward will file griev- 43 45 2 
ance (%) 


* Workers cannot express their satisfaction with the outcome 
because they are uncertain as to whether they will be laid off. 


men had the courage, the confidence, or what- 
ever trait is needed, to carry out the letter of 
the law and avoid the possible charge of dis- 
criminatory practice. Only 13% failed to 
settle the problem in the allotted time. The 
consequences of these three types of out- 
comes are apparent from the remainder of 
the table. 

Satisfaction for foreman, steward, and 
worker increase together and is greatest for 
adjusted solutions and least for cases in 
which no decision is reached. For each per- 
son involved, the difference in satisfaction for 
full layoff and adjusted solutions is signifi- 
cant at better than the 1% level of confi- 
dence. 

The type of solution reached also is re- 
lated to the type of discussion that occurred. 
The problem-solving type of discussion was 
associated primarily with adjusted solutions, 
whereas argumentative approaches led to 
deadlocks and judicial solutions. A chi- 
square test for the relationship between type 
of meeting and type of outcome is significant 
at the 1% level of confidence. 


, 


Two Approaches to Discipline in Industry 


The third set of comparisons, called worker 
reactions, shows the workers to be most often 
satisfied with their stewards when an adjust- 
ment results, and least often satisfied when 
given a full layoff. The percentages of work- 
ers who are satisfied with the steward, and 
percentages of those who will vote for him 
again when an adjustment results are each 
significantly different (at the 1% level of 
confidence) from the percentages obtained on 
these two items with the other two outcomes. 
The adjusted solution also resulted in only 
6% of the workers saying that they would 
reduce production while 40 and 43%, re- 
spectively (both significant at the 1% level 
of confidence), indicated reduced production 
for “full layoffs” and “failure to reach a de- 
cision.” 

A comparison of the number of stewards 
who will file grievances may be used as a 
measure of the extent to which various solu- 
tions actually failed to lead to a final settle- 
ment of the dispute. Table 1 clearly shows 
that only when an adjustment is reached can 
there be any degree of confidence that the 
problem has been settled. Both of the other 
two outcomes led to more than 40% of the 
stewards saying that they intended to file 
grievances, and these frequencies are signifi- 


321 


cantly different (at less than the 1% level of 
confidence) from the 2% figure obtained 
when an adjustment is reached. 

A breakdown of the adjusted solutions that 
were lumped together in the last column of 
Table 1 include the following: 


1. Reduced layoff (either two days or one). 

2. Forgiven (no layoff and no reprimand 
stated, but a warning may be implied). 

3. Warning and reprimand (it is made clear 
that the employee is at fault and that the 
next person violating the rule will be laid 
off; violation may be entered in personnel 
record). 

4. Consulting higher management (to ob- 
tain permission to make an exception, to 
change the rule or make a special decision 
in the case). 

5. Consulting workers (to determine what 
should be done in this case and be willing to 
abide by decision). 

6. Other (refers to the few that cannot be 
classified in above categories). 


Table 2 shows the satisfactions, the type 
of interaction, the worker’s reaction, and the 
steward’s follow-up for each of these adjusted 
solutions. Since the cases in some of the 
categories are rather small, the relationships 
obtained can merely be suggestive. 


Table 2 
Satisfaction with Different Adjustments 


Warning Consult reve 
Repri- Manage- OnBU 
a Forgiven ae ment Workers Other 
Instances in 89 groups 8 is Gd = : : 
Satisfaction with solution (%) 
Foreman ? 87 85 pad M is aS 
Steward 75 w a ic i 63 
Worker 62 ee i A A 
Interaction 
EN 0 0 $ R i Fi 
Intermediate 37 9i z z 50 50 
ae solving 63 o ue 3 
orker’s reactii Bf 
Satisfied ath ae 15 Mey ea z Ti 75 
Will vote for steward 62 tee 2 k: 0 0 
Will reduce production 25 = 9 7 0 0 
Steward will file grievance (%) 9 y å 


322 


Discussion 


In another investigation dealing with a 
disciplinary problem the writers * found fore- 
men reluctant to enforce a safety rule as evi- 
denced by the fact that only 7% of them 
laid off the man who violated the rule. The 
foreman gave two major reasons for not fol- 
lowing the letter of the rule: (a) the penalty 
of a three-week layoff was too strict, and (b) 
the foremen used in the case were not sure 
that a violation had occurred. The fact that 
45% of the workers admitted the violation 
indicates that the second reason given was 
more of an excuse than a reason for not ap- 
plying the penalty. 

In the present study the rule has been 
made less strict and the doubt as to whether 
a violation has occurred has been removed. 
Although the prescribed penalty is now ap- 
plied in 35% of the instances, 52% of the 
foremen fail to follow the letter of the law 
and instead either reduce or omit the penalty. 
The other 13% are unable to resolve the 
problem in the time allowed. Regardless of 
how one feels about rules, the fact remains 
that the persons who are intrusted to apply 
them are not doing so to the extent that is 
ideally supposed. If a rule is used as a for- 
mula to treat everyone in the same way, it is 
not accomplishing this objective. If fair treat- 
ment is a sound management objective, it is 
reasonable to question whether fairness can 
be legislated in a company. Different fore- 
men permit their own feelings and attitudes 
to determine how a violator will be treated, 
and they apparently accept this as a practical 
thing to do. Thus human factors influence 
the methods of dealing with people as soon 
as the authority figure comes face to face 
with the person who is to be punished. 

The present experimental data demonstrate 
not only that foremen are inclined to use 
what we have called the human relations ap- 
proach, but that this approach is more likely 
to produce desirable results and satisfaction 
for all concerned than is the judicial ap- 
proach, which is characterized by a consid- 


iL. E. Danielson and N. R. F. Maier. Super- 
visory problems in decision making: an experimental 
study of safety. Unpublished manuscript. 


Norman R. F. Maier and Lee E. Danielson 


eration of the factual evidence and the de- 
termination of innocence or guilt. 

What are the forces causing foremen to 
shy away from following company rules and 
avoiding the judicial approach? Group dis- 
cussions in connection with problems of this 
kind reveal two kinds of reasons. 

1. Foremen believe they can get better co- 
operation from their men if they treat them 
with consideration. Most foremen today be- 
lieve that a layoff will not solve the problem 
if one uses case studies to present a particu- 
lar situation. In general discussions, how- 
ever, they will argue in favor of rules and 
the need for consistency. 

2. Foremen are reluctant to risk a walk- 
out. It is their belief that if a grievance re- 
sults from disciplinary action, the company 
is likely to reverse the foreman’s decision. In 
any case they do not cherish the emotional 
problems involved in grievance proceedings, 
and even if backed up by the company, they 
fear they lose out in the estimation of their 
superiors if they figure in the trouble that 
has been caused by grievance proceedings. 

The first of these factors is emotional in 
nature in that there is a failure to accept the 
job of punishing a good employee. The sec- 
ond reason is more intellectual in nature and 
perhaps is a conclusion that has been reached 
through bitter experience. Foremen discover 
that they must learn to get along both with 
their superiors and with the union repre- 
sentatives. They have no final authority or 
power since the union has taken the big stick 
from them, and as a consequence they must 
use their wits. The judicial approach as- 
sumes that force is available and this is some- 
what unrealistic in our present society. The 
human relations approach respects the feel- 
ings of people and these feelings become a 
factor in reaching decisions. In the present 
study the foreman had all the objective facts 
on his side and the steward had little more 
than “feelings” to support his position. The 
“feeling” side of the issue was strong enough 
to win adjusted solutions in 52% of the cases: 
That these adjusted solutions were not vic- 
tories achieved by the steward through threat 
of a walkout is indicated by the fact that 
satisfaction with the outcome was highest 


4 


Two Approaches to Discipline in Industry 


for foremen when adjusted solutions were 
reached. No foreman experienced the ad- 
justed solution as a defeat. 

The experimental findings support the con- 
clusion that rules and penalties can no longer 
be regarded as effective procedures for con- 
trolling behavior or maintaining discipline. 
This means that new ways of controlling be- 
havior must be found. These must allow the 
foreman sufficient freedom to act so that he 
is not restrained by rules and can use his 
discretion. However, if the human factor is 
increased by moving away from judicial to 
the human relations approach, it means that 
foremen must be selected and trained to use 
human relations properly. In other words, 
foreman training is an essential part of a 
motivation program utilizing positive incen- 
tives. 

Summary 


In order to study the kinds of issues in- 


volved in a practical disciplinary problem, 


industrial supervisors were placed in a role- 
playing situation requiring that disciplinary 
action be taken. One person played the part 
of the supervisor, another the part of the un- 
ion steward, and a third person identified 
himself with the worker who was to be disci- 
plined. The background of the case made 
the violation of a no-smoking rule clear-cut 
so that a three-day layoff was in order. The 
steward intervened, however, and his func- 
tion was to get the foreman to change his 
decision. 

The results obtained are as follows: 

1. A total of 89 (52%) foremen altered 
their decisions and reached adjusted solu- 
tions. They tended to follow the human re- 
lations approach. 

2. A total of 60 (35%) foremen persisted 


323 


in their decisions and were governed by the 
fact that the worker was guilty. They fol- 
lowed the judicial approach. 

3. The remaining 23 foremen (13%) failed 
to settle the matter in the time allowed. 
They were reluctant to change their decisions 
and also hesitated to take a stand. 

4. The human relations approach was more 
successful than the judicial approach in that 
(a) satisfaction for foremen, stewards, and 
workers was greater; (b) the interview was 
more of a problem-solving type discussion 
than an argument; (c) the worker was more 
inclined to be satisfied with the steward; (d) 
the worker was less inclined to reduce his 
future production; and (e) the steward was 
less inclined to file a grievance, 

5. Adjusted solutions varied in nature, but 
more than half of them omitted the three- 
day penalty altogether. 


It is concluded that rules hamper the su- 
pervisor and place him in the awkward po- 
sition of either showing disrespect for higher 
management or a disregard for the feelings 
of his men, New ways in discipline must be 
sought and these require training in human 
relations. Rules can function only when 
power to enforce them exists. Even then 
they do not create positive motivation. In 
the absence of power, foremen must be al- 
lowed and trained to use human relations 


skills. 
Received November 23, 1955. 


References 


1. Maier, N. R. F. Principles of human relations. 
New York: Wiley, 1952. 

2. Maier, N. R. F., & Zerfoss, L. F. MRP: a tech- 
nique for training large groups of supervisors 
and its potential use in social research. Hum, 
Relat., 1952, 5, 177-186. 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Social Desirability and Responses to Items from Three 
MMPI Scales: D, Sc, and K 


Charles Hanley 
Michigan State University 


Psychologists long have been concerned 
with the effect of response sets on personality 
test scores. Of particular interest is the set 
leading an individual to respond to specific 
test items according to conscious or uncon- 
scious needs to appear as a superior person. 
In order to assess the presence of this set, 
and to correct scores for its effects, special 
scales have been developed, one of the most 
familiar being K, the suppressor scale for the 
Minnesota Multiphasic Personality Inven- 
tory. Several MMPI scales are corrected for 
such “defensiveness” by the addition of vari- 
ous amounts to raw scores, depending on the 
value of K received by the subject. 

A different approach to the study of this 
response set has been employed by Edwards 
(2). Using an inventory of his own design, 
Edwards had judges rate items on a 9-point 
scale of desirability. The inventory also was 
given as a personality test to another sample. 
The probability that an item would be en- 
dorsed (i.e., answered “true”) in the testing 
situation was found to be highly related to 
its judged desirability, the correlation be- 
tween these variables being .87. Edwards 
suggests two factors might underlie this re- 
lationship: (a) traits judged to be desirable 
are just those which are common in the popu- 
lation, and (b) subjects in general attempt 
to give good impressions of themselves when 
taking personality tests. Both factors are 
important in connection with the problem of 
defensiveness. 

The primary aim of the two experiments 
to be reported was to determine the extent 
to which the relationship discovered by Ed- 
wards holds for other more widely used in- 
ventories, specifically, the MMPI. Experi- 
ment I, in addition, was designed so that a 
scale with a maximal K correction, the Sc 
(Schizophrenia) scale, and a scale without a 
K correction, the D (Depression) scale, could 
be compared with respect to the relationship 


between social desirability and probability of 
endorsement. Experiment II investigated this 
relationship in the K scale itself, in order to 
obtain evidence regarding the validity of 
that dimension. 


Experiment I: Sc and D 


Method. Random samples of 25 of the 60 D 
items and 32 of the 78 Sc items were reproduced in 
mimeograph form, four items being common to both 
samples. These items constitute slightly more than 
40% of the two populations of scale items. 

The selected items were rated on a 9-point scale 
of social desirability by 43 male and 44 female un- 
dergraduates at Michigan State University. The 
judges were instructed to make their ratings in ac- 
cordance with their opinion as to how “people in 
general” felt about the attitudes expressed by the 
items. The social desirability of an item then could 
be expressed by its median rating on this scale. 

Probability of endorsement was obtained by count- 
ing the pertinent responses of 64 male and 42 fe- 
male undergraduates who had taken the MMPI in 
connection with an earlier study at Michigan State. 
The probability of endorsement of an item is the 
decimal ratio of the number of subjects answering 
“true” to the total number answering the item. 
Thus, the item rated the most socially undesirable, 
“Most of the time I wish I were dead,” had a prob- 
ability of endorsement of .01. The neutral item, “I 
am easily awakened by noise,” had an endorsement 
value of .35, while the item judged most desirable 
of all, “I loved my mother,” had a probability of 
endorsement of .93. 


Results. Most of the items received unde- 
sirable ratings. Of the Sc items, there were 
24 rated as 4.0 or less (undesirable), 4 be- 
tween 4.0 and 6.0, and 4 as 6.0 or more (de- 
sirable). Of the D items, 13 were rated as 
4.0 or less, 7 had ratings between 4.0 and 6.0, 
and 5 received ratings of 6.0 or more. The 
Sc scale has relatively fewer neutral items 
than D. This may be one reason why a cor- 

1 Edwards converted median ratings into values on 
an equal-interval scale by applying the method of 
successive categories (1). Scale values in the pres- 
ent study were so highly correlated with rated values 


that the latter measures are used for purposes of 
simplicity. 


324 


Social Desirability and Responses to MMPI Items 


rection is applied for Sc but not for D. In 
this connection, Wiener (5) notes that the 
Sc scale lacks “subtle” items. 

Two kinds of relationships are worth de- 
scribing. The first is that between social 
desirability and probability of endorsement. 
The product-moment correlation between these 
variables is .82 for the D items and .89 for 
the Sc items.* Both coefficients are of the 
same order of magnitude as the correlation 
reported by Edwards for his inventory. 

Fiducial limits for these coefficients cannot 
be ascertained by the usual methods, since 
both item samples were drawn from small, 
finite populations. Possibly more important 
as a source of uncertainty is the number of 
judges and number of subjects employed in 
the study. Increasing these groups in size 
undoubtedly would yield more reliable esti- 
mates of the values entering into the com- 
putations, but this fact cannot be expressed 
in terms of the standard error of a correla- 
tion coefficient. 

The foregoing indicates that a person who 
answers items in the socially desirable man- 
ner will perform much as does the average 
college student. This observation raises the 
question of the relationship of social desir- 
ability to the manner in which items are 
scored on the keys. By classifying “true” re- 
sponses into the dichotomy “keyed-unkeyed,” 
a point-biserial correlation may be computed 
using social desirability as the continuous 
variable. For the Sc items, this coefficient is 
.84; for the D items, it is 58. Thus, it ap- 
pears that the scoring of the Sc scale is more 
highly related to social desirability than is 
the case with D. On both scales, however, a 
tendency to answer in the socially desirable 
manner will result in lowered raw scores. 


Experiment II: K 


Method. In the second study, the procedure was 
similar to that used in Experiment I. A list of 29 
of the 30 K items was mimeographed; the omission 
of the thirtieth item was not noted until the data 
had been collected. Five non-K items from the ear- 
lier study were included; these plus six K items also 


2When probability of endorsement is estimated 
from data obtained by Hathaway on a normative 
sample of 152 college males and 113 college females, 
the correlations are .92 for the Sc items and .86 for 
the D iterffs, s 


325 


present in the D and Sc samples give an overlap of 
11 items between the two lists, 

A new sample of judges was employed: 43 male 
and 34 female undergraduates at Michigan State. 
The instructions were those used in Experiment I. 
Again, social desirability is represented by the me- 
dian rating given items by the judges. 

Probability of endorsement is based on counts of 
responses in MMPI records accumulated in a sepa- 
rate experiment with students in introductory psy- 
chology classes. The records of 63 males and 37 
females were used to obtain probability of endorse- 
ment in a “typical” college sample. Records of 32 
males and 19 females having raw K scores greater 
than 19 (standard scores of 64 or more) were used 
to obtain endorsement values in a “High-K” group, 
while probability of endorsement in an “Average-K” 
group was secured from the records of 42 males and 
29 females having raw K scores ranging from 11 to 
14 (standard scores of 48-53). 


Results. An indication of the stability of 
ratings of social desirability may be had by 
comparing the judgments given to the 11 
items common to both experiments. Table 1 
presents the necessary information. Differ- 
ences are small, except in the case of Item 
142, which receives a slightly undesirable 
rating in Experiment I but a clearly undesir- 
able rating in Experiment II. On the whole, 
the judgments appear to be stable. 

Of the 29 K items studied, 16 received 
median desirability ratings of 4.0 or less (un- 
desirable), 5 had median ratings of 6.0 or 
more (desirable), while the remaining 8 had 
ratings falling between 4.0 and 6.0. The ma- 
jority of the items, in other words, express 
attitudes judged to be undesirable. 


Table 1 


Median Social Desirability Ratings of Items Common 
to Lists Used in Both Experiments 


MMPI Median Rating 
Booklet ieper ETE 
Number D-S¢ List K List 
5 5.0 48 
9 7.6 6.5 
39 2.6 2.6 
89 4.4 4.7 
138 3.8 3.3 
142 4.0 2.6 
160 8.6 8.5 
220 8.8 8.8 
301 2.5 2:2 
322 4.3 4.0 
339 1.2 1.1 


326 


An estimate of the relationship between 
social desirability and probability of endorse- 
ment was obtained by correlating these vari- 
ables using probability of endorsement values 
from the typical college group. The product- 
moment correlation is .50. A positive rela- 
tionship exists, but its magnitude is consider- 
ably less than those found to hold for the D 
and Sc items used in Experiment I. 

The existence of this relationship might 
seem to indicate that K scores may be low- 
ered if the response set is operating, but an- 


other possibility is present. In Experiment I, 


it might be assumed that the incidence of 
schizophrenia and depression was negligible 
among the college students whose responses 
determined the endorsement values for the 
Sc and D items. It is quite likely, however, 
that the sample of students, whose responses 
yielded endorsement values for the K items, 
included a considerable number who had the 
set to answer items according to their social 
desirability. As far as the records of stu- 
dents lacking the set are concerned, the cor- 
relation between social desirability and prob- 
ability of endorsement might be negligible, 
but if an appreciable number of students have 
that set, their responses would create a posi- 
tive correlation between the two variables. 
Using K to indicate the presence of this re- 
sponse set, it is possible to compare the cor- 
relation between desirability and endorsement 
in a sample apparently lacking the set with 
the correlation obtained in a sample in which 
the set presumably is present. 

The set, insofar as it is measured by K, is 
absent in the “Average-K” sample, In this 
group, the correlation between desirability and 
endorsement is 38. The set supposedly is 
present in the “High-K” group. In this sam- 
ple, the correlation is .66. From these data, 
therefore, it appears that individuals who re- 
ceive high K scores have a greater tendency 
to respond to items according to their social 
desirability than is the case with subjects ob- 
taining lower K scores. The validity of K as 
a measure of the response set is demonstrated 
by this evidence. 


3 When endorsement is based on Hathaway’: 
the correlation is 59, micas: 


Charles Hanley 


Tf a scale is to measure the presence of the 
set, its scoring should be consistent with the 
social desirability of its items. Other things 
being equal, endorsement of desirable items 
and denial of undesirable items should char- 
acterize the test performance of defensive in- 
dividuals. The scoring of the K items, how- 
ever, is not entirely consistent with this 
scheme. All 29 K items in this experiment 
are keyed so that denial indicates the pres- 
ence of the set. There are five items (Book- 
let No. 160, 272, 296, 461, and 502) that ob- 
tain desirability ratings of 6.0 or more, yet 
answering these items “false” (i.e., denying 
desirable qualities) is scored as indicating 


. the presence of the set. 


This being the case, it might be hypothe- 
sized that the best separation between indi- 
viduals having the set and those lacking it 
would be produced by items whose scoring is 
consistent with their judged social desirabil- 
ity, in this instance the relatively undesirable 
K items. A rough test of this hypothesis 
is obtained by correlating social desirability 
with the differences in endorsement given by 
the “Average-K” and “High-K” samples to 
the individual items. The difference for each 
item is obtained by subtracting its “High-K” 
probability of endorsement from its ““Average- 
K” value. The product-moment correlation 
between these differences and social desir- 
ability ratings is — 49, a result indicating 
that the best separation between the two 
groups tends to be provided by K items 
judged to be relatively undesirable. It is just 
these items that are scored in a manner con- 
sistent with their social desirability ratings. 


Discussion 


The present study demonstrates that the 
relationship between social desirability and 
endorsement of items reported by Edwards 
(2) is not confined to his inventory. Indeed, 
one might venture to speculate that it would 
hold for most conventional personality tests. 

With such high correlations as exist be- 
tween desirability and endorsement in the D 
and Sc items, the way clearly is open for mis- 
Tepresentation by subjects. A number of al- 
ternatives are available for dealing with the 
Problem. Edwards suggests as a remedy the 


Ee" 


Social Desirability and Responses to MMPI Items 


pairing of items on the basis of social desir- 
ability and the institution of a forced-choice 
technique of answering. The effectiveness of 
this ipsative approach is not yet reported. 

Another possibility may be useful and does 
not entail abandoning the traditional form of 
the personality inventory. Preliminary judg- 
ing of items will reveal those at the extremes 
of the social desirability dimension. To the 
extent that these can be discarded and rela- 
tively neutral items substituted in their place, 
the scale which results should be less vulner- 
able to manipulation. Neutral items may be 
few and far between, but the more there are, 
the better the scale will be. 

A third approach is to construct a sup- 
pressor scale which can be used to correct for 
the effects of the response set. Such a scale 
should consist of items heterogeneous in so- 
cial desirability to which probability of en- 
dorsement is unrelated in a population not 
having the set. To some extent, the K scale 
of the MMPI approximates this desired state 
of affairs. Its items vary in social desirabil- 
ity, and in a sample of individuals who ob- 
tained “average” K scores, the correlation 
between social desirability and probability of 
endorsement is relatively low. This is not 
the case with subjects obtaining “high” K 
scores; with these individuals the two vari- 
ables have a relatively high correlation. The 
scoring of the K scale, however, is not always 
consistent with the social desirability of its 
items. With a change in the scoring of five 
items, a somewhat different group of “highs” 
would have been discovered, and for them 
the correlation between social desirability and 
probability of endorsement might be even 
higher than was found using the present scor- 
ing key. 

Tf the K scale measures the set to respond 
according to the social desirability of the 
traits expressed by the items, it may be asked 
why certain MMPI scales and not others are 
corrected by its use—for example, why K is 
added to Sc and not to D. In the present 
study, it appears that D is vulnerable to de- 
fensiveness, although not to the extent that is 
true for Sc. 

The MMPI procedure was based on em- 
pirical evidence. Meehl and Hathaway (4) 


327 


found that the addition of K to the Sc scale 
reduced the number of “false negatives,” but 
no similar reduction was accomplished when 
K was used to correct D. A partial cause for 
this failure in the case of D lies in purely 
psychometric factors. There is an appreci- 
able item overlap between the K and D 
scales, 8 of the 30 K items being scored for 
D as well. When the scoring of an item on 
K is the opposite of its scoring on D, the net 
effect will be to cancel the item when K is 
added to D. When the common items are 
scored in the same manner for K as for D, 
the effect of adding K to D is to increase the 
weight given these items. In a scale as long 
as D (60 items), weighting items will not 
produce a useful increase in efficiency (3, p. 
447). 

The fact that K is not used to correct cer- 
tain MMPI scales, therefore, does not indi- 
cate that scores on these dimensions cannot 
be influenced by a set to respond according 
to the social desirability of the items. It 
does not, on the other hand, demonstrate 
that the K scale fails to measure the presence 
of the set. What is suggested is that a re- 
vision of the K scale, which might involve 
the elimination of item overlap, a change in 
the scoring of certain items, and the addition 
of new items having the formal properties of 
suppressors could produce increased validity 
in the presently uncorrected MMPI scales. 


Summary 


Two experiments were described in which 
responses to items from three MMPI scales 
were related to the judged social desirability 
of these items. 

In Experiment I, ratings of the social de- 
sirability of random samples of 25 D and 32 
Sc items were correlated with the probabili- 
ties that the items would be endorsed when 
the MMPI was used as a personality test. 
The correlations between social desirability 
and probability of endorsement are .82 for 
the D items, and .89 for the Sc items. The 
scoring of both scales was found to be sys- 
tematically related to the social desirability 
of the items. 

In Experiment II, social desirability ratings 
and probabilities of endorsement were corre- 


328 


lated using a sample of 29 of the 30 K items. 
In a typical college sample, these variables 
correlate .50. In an “Average-K” group, the 
correlation is .38, while in a “High-K” sample 
the correlation is .66. The results were inter- 
preted as demonstrating the validity of K as 
a measure of the set to respond to items in 
terms of their social desirability. It was 
pointed out, however, that the scoring of sev- 
eral K items was not consistent with their 
judged social desirability. Suggestions were 
made regarding possible improvements in the 
K scale and its application to scales presently 
uncorrected for by K. 


Received November 16, 1955. 


Charles H anley 


References 


1. Edwards, A. L. The scaling of stimuli by the 
method of successive intervals. J. appl. Psy- 
chol., 1952, 36, 118-122. 

2, Edwards, A. L. The relationship between the 
judged desirability of a trait and the prob- 
ability that the trait will be endorsed. J, 
appl. Psychol, 1953, 37, 90-93. 

3. Guilford, J. P. Psychometric methods. 
Ed.) New York: McGraw-Hill, 1954. 

4. Meehl, P. E., & Hathaway, S. R. The K factor 
aS a suppressor variable in the Minnesota 
Multiphasic Personality Inventory. J. appl. 
Psychol., 1946, 30, 525-564. 

5. Wiener, D. R. Subtle and obvious keys for the 
Minnesota Multiphasic Personality Inventory. 
J. consult. Psychol., 1948, 12, 164-170. 


(2nd 


: 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


The Relationship Between Attitude Toward the Army and 
the Acceptance Accorded QM Items of Issue? 


James F. Parker, Jr. and Ray C. Hackman 


University of Maryland 


The U. S. Army realizes the importance of 
issuing clothing and personal equipment to 
the soldier that will meet the subjective stand- 
ards set by the soldier, and therefore is faced 
with a problem similar to the consumer re- 
search problem in industry. The preferences 
of the soldiers for new articles proposed as 
future items of issue, and the acceptance ac- 
corded them, are assumed to play a large part 
in determining the extent to which the sol- 
dier will retain an article and use it in the 
prescribed manner. The Army maintains 
several centers which conduct extensive tests 
to determine both the physical characteristics 
of new articles of clothing and personal equip- 
ment and the subjective acceptance which 
will be accorded them. In testing for the 
relative acceptability of items of a given 
class the Army frequently uses the panel 
method with several large groups participat- 
ing in the test. Reactions to the articles and 
evaluations of them are solicited from the 
panel members by means of questionnaires. 


Problem 


In part of the tests conducted by the Sur- 
vey Division of the Quartermaster Research 
and Development Field Evaluation Agency 
at Fort Lee, Virginia, subjects are selected 
from a pool of volunteers to fit whatever sizes 
are available in the articles to be tested. The 
pool of volunteers is usually large since the 
articles used may sometimes be retained at 
the conclusion of the test. A potential source 
of sampling bias exists, however, since the 
soldiers chosen to be panel members consti- 
tute a very select sample and, as such, may 


1 This study was conducted as a part of Contract 
DA 44-109qm-129 let by the Office of the Quarter- 
master General to the University of Maryland. _ This 
Paper was reported in part at the annual meeting of 
the Eastern Psychological Association, April, 1953. 

pinions expressed herein are those of the authors 
and do not necessarily represent those of the Office 
of the Quartermaster General or the U. S. Army. 


not be representative of the entire Army 
population. Since these men are volunteers 
who participate in the test in addition to 
their other duties, it would appear likely 
that they are quite favorably disposed to- 
ward the Army. The principal hypothesis of 
this study was that soldiers differing in their 
general attitude toward the Army have a dif- 
ferent regard for items bearing the Army 
label and that results obtained with these 
soldiers are not applicable to the entire Army. 
In addition, it was proposed to test the ef- 
fects of three other variables on soldier ac- 
ceptance: length of service in the Army, level 
of education of the soldier, and the size of 
the city or town in which the soldier had 
lived for the major portion of his life. 


Procedure 


Four hundred enlisted men at Fort Lee were ini- 
tially selected so as to represent as wide a range of 
service experience as possible. Examination of the 
biographical information on these men showed that 
the most meaningful categories into which these men 
could be grouped were as follows: basic trainees 
with fewer than eight weeks in the Army (Group 
I); soldiers with more than six months and less than 
two years of experience in the Army who had never 
been overseas (Group II); and combat veterans, 
each of whom had been in the Army for more than 
two years and had been awarded at least one battle 
star (Group III). 

Each of these groups was ordered with respect to 
general attitude toward the Army by means of a 
Guttman scale analysis, using a set of items re- 
ported previously (3). These items had been found 
to be scalable and in a pretest on a group of Na- 
tional Guardsmen were again found to be scalable. 
In order that the attitude groups used in the final 
analysis might be as different as possible, within 
each of the experience categories (Groups I, TI, and 
III), only the top eight, those very favorably dis- 
posed toward the Army, a middle eight who were 
relatively neutral, and the bottom eight, who had a 
decided dislike for the Army, were used. Thus 
there were 24 cases in each of Groups I, II, and III, 
or a total of 72 cases in the complete analysis. 

To determine the effects of these variables on ac- 
ceptance, an instrument was devised to yield an ob- 


329 


330 


jective acceptance score for each individual. As a 
first step in accomplishing this, all similar attitude 
groups were combined. That is to say, the top 
eight from each of Groups I, II, and III were com- 
bined to form a single favorable group and simi- 
larly for each of the other attitude groups. These 
resulting groups were essentially alike in all respects 
except attitude and’ were designated the F, M, and 
U groups, standing for favorable, middle or neutral, 
and unfavorable, 

Next, a preference scale of Quartermaster items 
ordered along a metric was established. Pretesting 
suggested that the method of paired comparisons 
could not be used for this purpose since it appeared 
to present an unrealistic task to the subject (S). 
Accordingly, the following system was devised: The 
soldier was given a list of 14 Quartermaster items, 
each being quite familiar to him. He was first asked 
to select the three articles he considered best and 
the three he considered poorest and to give a rea- 
son for the selection of each. The reasons were 
asked for primarily as a device for forcing a more 
critical evaluation of the items. Second, the S was 
asked to rank, from best to worst, the top three 
and the bottom three items. For each S, then, his 
rankings of items one, two, and three, and twelve, 
thirteen, and fourteen were obtained with no knowl- 
edge of his rankings of the eight middle items. A 
rank of 7,5 was arbitrarily assigned to each article 
not named. A preference scale for these 14 items 
was established by summing the rankings over all 
Ss and using Guilford’s method (1) for achieving a 
metric from ranked data, In Table 1 are shown the 
preference scales established by each of the three 
attitude groups. 

The last section of the instrument consisted of 


James F. Parker and Ray C. Hackman 


that part from which the individual’s acceptance 
score was actually obtained. In this section, each 
S evaluated separately each article in the list of 14 
QM items used in setting up the preference scale. 
If, in his opinion, an item was generally superior to 
its civilian counterpart, he indicated his preference 
for the Army article. If, however, he could think 
of a similar civilian article which he considered to 
be, in most respects, better than the Army article, 
he indicated his preference for the civilian article, 
This appeared to be a realistic task and should have 
permitted all aspects of item acceptance to operate, 

An objective acceptance score was determined for 
each individual by recording the scale value of each 
article chosen as being superior to its civilian coun- 
terpart and then summing these scale values over all 
articles chosen. It should be noted that the scale 
values used in obtaining the acceptance scores were 
those which were established by the M group, those 
neutral on the attitude scale, alone, since it was felt 
that scales produced by the other two groups would 
be contaminated by the extremes of attitude operat- 
ing in these groups. 


Results 


These acceptance scores went into the nine 
cells of a two-way classification table. In 
this table one dimension represented the 
three-way categorization based on length of 
service, and the other represented the three- 
way categorization based on the attitude 
measure. Table 2 shows the analysis of vari- 
ance computed from this table. 


Table 1 
Preference Scales for QM Items Established Through the Rankings of Articles 
by the Three Attitude Groups 

Item F Scale M cale l 
Rank Group Value Group Value Sn Value 
14* Boots, russet 1.83 Boots, russet 1.46 Boots, russet 1.44 
13 Ike jacket 1.55 Undershirt 1.14 Ike jacket 1.02 
12 Trousers, OD 1.45 Trousers, OD 1.10 Shirt, cotton 1.00 
11 Shirt, cotton 1.28 Shirt, cotton 1.01 Sweater 1.00 
10 Field jacket 1.27 Ike jacket 92 Undershirt 1.00 
9 Blanket, wool 1.19 Jacket, HBT 88 Blanket, wool 93 

8 Undershirt 1.13 Field jacket 85 Field jacket 91 

7 Cap, HBT 1.01 Cap, HBT 81 Cap, HBT 80 

6 Jacket, HBT 97 Blanket, wool 80 Trousers, OD -16 

5 Sweater 92 Gloves, wool 67 Jacket, HBT 69 

4 Gloves, wool 82 Raincoat 62 Gloves wool 66 
3 Raincoat -74 Sweater -61 Raincoat 57 

2 Necktie A9 Necktie 27 Necktie 56 

1 Overcoat, wool .00 Overcoat, wool -00 Overcoat, wool -00 


* The highest number represents the article most preferred. 


k 


Attitude Toward the Army and QM Items of Issue 


Table 2 


Results of Analysis of Variance of Acceptance Scores 


Mean 
Source df Square F P 
Attitude 2 98.7218* 21.199 <.01 
Length of Service 2 6.2524 1.342 >05 
Interaction (A X L) 4 1.9251 
Error 63 4.6567 
Total 71 


*F (df = 2, 60) = 4.98 at the 1% level of significance. 


Examination of the analysis of variance 
table shows the effect of attitude differences 
on acceptance to be highly significant. This 
substantiates the initial hypothesis that there 
is a definite quantitative difference in the ac- 
ceptance which is accorded QM items of 
issue by soldiers representing different por- 
tions of the attitude continuum. Generaliz- 
ing from this, we. would expect soldiers who 
have a low regard for the Army to dislike 
items of issue bearing the Army label. This 
finding reaffirms a suspicion that the accept- 
ance of QM items is based in part on things 
other than simply the quality and/or utility 
of the items. 

Length of service (as defined) was not a 
significant source of variation. It would 
seem that a long period of time in the Army, 
and the consequent greater amount of experi- 


331 


ence with military equipment, has little ef- 
fect on the way in which the soldier evalu- 
ates the articles of clothing and personal 
equipment he uses and wears. If the initial 
acceptance level of an article is low, mere 
passage of time will not increase its accepta- 
bility for the user. 

No significant interaction effect between 
attitude differences and length of service was 
observed. 

The significance of the relationship be- 
tween level of education and acceptance, and 
between the size of the town in which the 
respondent had previously lived and accept- 
ance, were both tested by means of a coeffi- 
cient of contingency. In both instances it 
was found that virtually no relationship 
existed. 

Possibly the most important finding of this 
study was that significant differences exist 
among the preference scales established by 
the three attitude groups. Variances of the 
metric values for each article in the prefer- 
ence scales established by the attitude groups 
were examined by means of an L; test (2), 
and significant heterogeneity was found. This 
suggests that the relationship of attitude and 
acceptance varies from item to item. Soldiers 
on the favorable end of the attitude con- 
tinuum differed in their judgments as to what 
constitutes good and bad items of issue from 


Table 3 
High and Low Variances for Articles on the Preference Scale 
Rank-Order Position by Group 
Items Most Affected 
by Attitude Variance F M U 
1, Trousers, OD 0569 12* (40)** 12 (.30) 6 (—.06) 
2. Sweater, high neck 0411 5 (—.13) 3 (—.18) 11 (19) 
3. Ike jacket .0377 13 (.49) 10 (.12) 13 (,20) 
Rank-Order Position by Group 
Items Least Affected i 
by Attitude Variance F M U 
1. Gloves .0022 4 (—.22) 5 (—.13) 4 (—.16) 
2. Cap, HBT .0006 7 (—.04) 7 (01) 7 (—.02) 
3, Shirt, cotton 0002 11 (.23) 11 (21) 12 (419) 


* High numbers represent high position on the preference seale as scaling procedure (1) 


._, ** Numbers in parentheses are metric values obtai 
distributed form. ‘When presented in Table 1 each scale value, 
obtained for that group. 


‘i . These values are in normally 


for each group, was expressed as a distance from the lowest value 


332 


soldiers who were unfavorably disposed to- 
ward the Army. Table 3 shows the three ar- 
ticles most affected by attitude differences 
and the three least affected. Note that ar- 
ticles from all positions on the preference 
scales are included in both groups. Particu- 
larly it should be noted that the top group, 
those articles most affected, includes items at 
all levels of acceptance. 

It is also apparent from an inspection of 
this table that it is necessary to attend to 
more than the rank ordering of articles in 
order to evaluate the effect of attitude. For 
example, some items, such as the overcoat 
and necktie, were consistently ranked, on the 
average, as poorest and next to poorest by all 
three groups. These items did not appear in 
the section marked “Items least affected by 
attitude,” however, because even though the 
group rankings were consistent, differing vari- 
abilities within the groups produced different 
metric positions for the items. Here it should 
again be noted that the metric scores used in 
the L; test for homogeneity of variance were 
in standard score units, prior to each scale 
value being expressed as a distance from the 
lowest scale value in the group. 


Summary 


A questionnaire designed to elicit a general 
attitude toward the Army was administered 
to 400 soldiers at Fort Lee, Virginia. From 
this sample three smaller groups of 24 each 
were selected so as to represent those very 
favorable toward the Army, those relatively 
neutral, and those definitely disposed against 
Army life, In addition these groups were se- 
lected so as to encompass a wide range with 
respect to length of service. These soldiers 


James F. Parker and Ray C. Hackman 


were then asked to evaluate 14 articles of 
standard QM issue in a manner designed to 
indicate the relative acceptance accorded each 
article. 

Results of the study indicate that differ- 
ences in the general attitude toward the Army 
held by soldiers operate to produce changes 
in the manner in which the soldier evaluates 
the articles he is issued to use and wear. 
This probably influences the results obtained 
in an acceptance testing program. The use 
of random or stratified random sampling in 
selecting acceptance testing subjects should 
replace the use of volunteers. This would 
prove a suitable control for this variable and 
permit generalization of results to the entire 
Army population with a higher degree of va- 
lidity. 

In addition, these results suggest that for 
some QM articles more than others, psycho- 
logical factors such as general attitude to- 
ward the Army are markedly influential in 
determining acceptance. These articles, then, 
are those for which acceptance testing is most 
essential. They are the ones, relatively 
speaking, for which the determination of only 
quality or utility is inadequate for the proper 
evaluation of the item. 


Received December 12, 1955. 


References 


1, Guilford, J. P. Psychometric methods. New 
York: McGraw-Hill, 1936. 

2.. Johnson, P. O. Statistical methods in research. 
New York: Prentice-Hall, 1949. 

3. Stauffer, S. A. Guttman, L., Suchman, E. A. 
Lazarsfeld, P. F., Star, S. A., & Clausen, J. A. 
Measurement and prediction. Vol. IV. Stud- 
ies in social psychology in World War II. 
Princeton: Princeton Univer. Press, 1950. 


a 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Teacher Attitudes and Temperament as a Measure of 
Teacher-Pupil Rapport 


Carroll H. Leeds 


Furman University 


One approach to the study of teacher per- 
sonality is by way of psychometrics. The 
present article reports a study of teacher 
personality in which use was made of two 
psychometric instruments: the Minnesota 
Teacher Attitude Inventory (MTAI) and 
the Guilford-Zimmerman Temperament Sur- 
vey (GZTS). 

The MTAI (3) was designed to predict 
the social-emotional climate that a teacher 
will maintain in the classroom. It was as- 
sumed in its construction that a teacher’s 
attitudes toward pupils and pupil behavior 
provide an index to the teacher’s personality, 
and thus to his ability in establishing and 
maintaining desirable interpersonal relation- 
ships in the classroom. The MTAI has been 
shown to measure teacher-pupil rapport with 
validity coefficients ranging from .46 to .63 
(3, 6). 

The present investigation had as its prin- 
cipal object an attempt to determine some- 
what more definitely those factors in per- 
sonality and temperament that the MTAT is 
measuring. The instrument employed in this 
objective was the GZTS (5). A product of 
factor analysis, this instrument was designed 
to identify and measure ten traits of tempera- 
ment: General Activity (G), Restraint (R), 
Ascendance (A), Sociability (S), Emotional 
Stability (E), Objectivity (O), Friendliness 
(F), Thoughtfulness (T), Personal Relations 
(P), and Masculinity (M). Any relation- 
ships noted between scores on these traits 
and MTAI scores would provide some indi- 
cation of what temperament traits tend to 
characterize teachers who maintain harmoni- 
ous relations with pupils, and teachers who 
do not get along well with pupils. 

The MTAI and the GZTS, in this order, 
were administered to 300 public school teach- 
ers in one of South Carolina’s largest cities, 
with a metropolitan population approximat- 
ing 170,000. Each teacher had been ap- 


333 


proached individually and asked to cooper- 
ate. Three hundred teachers, from the first 
grade through the twelfth, agreed to partici- 
pate in the study. 

Scores on the MTAI ranged from a plus 
120 to a minus 102, with a mean of 28.6, a 
median of 30, and a standard deviation of 
42.8. 


Correlation of Traits with MTAI 


Product-moment correlation coefficients were 
obtained between MTAI scores and scores 
for each of the ten traits of temperament 
measured by the GZTS. The only trait for 
which the data were treated separately ac- 
cording to sex (270 women and 30 men) was 
Masculinity (M). This was necessary be- 
cause of the sex differential in the norms for 
this trait. 

The correlation coefficients are presented 
in Table 1. They are all significant at the 
1% level of confidence except for three— 
those relating to the traits G, R, and T. The 
only negative coefficient (— .07), nonsignifi- 
cant statistically, is that for T. Traits most 
closely related to MTAI scores, or to teacher- 
pupil rapport, are P (Personal Relations), F 
(Friendliness), O (Objectivity), and E (Emo- 
tional Stability). Trait P shows the highest 
relationship (.52). This finding is in agree- 
ment with observations concerning this trait 
found in the GZTS Manual. According to 
the test authors: 

Of all the scores, this one has consistently corre- 
lated highest with all criteria involving human rela- 
tions. It seems to represent the core of “getting 
along with others.” .. . A high score means toler- 
ance and understanding of other people and their 
human weaknesses. A low score indicates fault- 
finding and criticalness of other people and of insti- 
tutions generally. The low-scoring person is not 
likely to “get along with others” (5, p. 9). 


There is definite indication then that teach- 
ers who get along well with pupils tend to be 
A 


334 


Table 1 


Correlations Between Temperament Trait 
Scores and MTAI Scores * 


Trait Correlation 
General Activity (G) .06* 
Restraint (R) .06* 
Ascendance (A) 19 
Sociability (S) -20 
Emotional Stability (E) 36 
Objectivity (O) 44 
Friendliness (F) 36 
Thoughtfulness (T) —.07* 
Personal Relations (P) 52 
Masculinity (M) 

Women (N=270) 16 
Men (N=30) „50 


* All correlations are ificant at the 1% level except those 
marked with an Guan à i 


cooperative, friendly, objective, and emotion- 
ally stable, and, to a lesser degree, manifest 
sociability, social ascendancy, and masculin- 
ity in emotions and interests. Those who do 
not have high rapport with pupils, on the 
other hand, tend to be critical and intolerant, 
hostile and belligerent, hypersensitive, de- 
pressed, and emotionally unstable. To a 
lesser degree, they tend toward submissive- 
ness, shyness, seclusiveness, and femininity. 
The results also indicate that, to a certain 


Carroll H. Leeds 


extent, the MTAI score is an indirect meas- 
ure of these temperament traits. 


Item Analysis 


To further substantiate and clarify these 
findings, a rough item analysis was made by 
comparing the item responses on the GZTS 
made by the highest and lowest 25% of 
teachers in the distribution of MTAI scores, 
These scores for the 75 teachers in the upper 
group range from 120 to 61 and for the 75 
teachers in the lower group from —3 to 
— 102. For each of the ten traits, Table 2 
presents the number and percentage of the 
30 items discriminating between the highest 
and lowest 75 MTAI scores. The results 
agree fairly closely with the correlations ob- 
tained—the highest frequencies of discrimi- 
nation are found with those traits correlating 
highest with MTAI scores and the lowest fre- 
quencies with those traits showing the least 
correlation. For example, trait P (Personal 
Relations), which correlates .52 with the 
MTAI, has 77% of its items showing dis- 
crimination in the same direction as the 
correlation, none showing discrimination in 
the opposite direction, and 23% showing no 
discrimination. 

The behavior of some of the items in traits 
correlating low with the MTAI deserves some 
explanation and warrants some examination 


Table 2 


Number and Per Cent of the 30 Ttems for Each of 10 Traits Discriminating Between the 
Upper and Lower 25% of MTAI Scores 


Discrimination Discrimination 

in Correlation Opposite to No 
irection Correlation Discrimination 
Trait N % N % N % 
G 15 50 6 20 9 30 
R 13 43 6 20 11 37 
A 19 64 1 3 10 33 
S 22 73 2 7 6 20 
E 27 90 0 0 3 10 
o 28 93 0 0 2 7 
F 24 80 1 3 5 17 
T 12 40 10 33 8 27 
ix 23 17 0 0 7 23 
M (Men) 9 30 2 7 19 63 
M (Women) 12 40 8 27 10 33 


Teacher Attitudes and Temperament 335 


of item content. Trait T (Thoughtfulness 
or Thinking Introversion), which correlates 
— .07 with MTAI scores, shows as many as 
40% of its items discriminating in the direc- 
tion of a negative correlation. The item, 
“You enjoy thinking out complicated prob- 
lems,” is agreed to more frequently (67% vs. 
44%) by high MTAT scorers than by low 
scorers. Such agreement contributes to a 
positive correlation, But, there are a num- 
ber of other items in the T category, such as 
the following, which contribute to the ob- 
tained negative correlation: “You are much 
concerned over the morals of your genera- 
tion.” As with the previous item, a “yes” 
response adds to the T score, but, with this 
item, only 39% of the high MTAT scorers 
respond “yes” as contrasted with 76% of the 
low scorers. Previous work with the MTAT 
(2) has identified a so-called Pharisaic-virtue 
attitude which characterizes teachers scoring 
low on the MTAI. This attitude seems to 
involve a perfectionistic and dogmatic adher- 
ence to rigidly established moral principles. 
This item, among others in the GZTS, appar- 
ently taps this attitude with a resultant mini- 
mizing effect upon the measure of the in- 
tended trait, Thoughtfulness. This effect 
undoubtedly contributes to the low correla- 
tion, even slightly negative, of MTAT scores 
with the T trait. 

The same effect seems to operate with at 
least two or more other traits in the GZTS. 
In Trait G is the following item: “It is hard 
to understand why many people are so slow 
and get so little done.” A “yes” response to 
this item contributes to a higher G score, but 
at the same time it expresses an attitude of 
hostility toward others which has also been 
found to characterize teachers scoring low on 
the MTAI (2). An item in the R category 
(Restraint), which discriminates in favor of 
the low MTAI scorer, reads as follows: “It is 
difficult for you to understand how some peo- 
ple can be so unconcerned about the future.” 
Agreement with the item contributes to the 
R score, but high MTAI scorers tend to dis- 
agree. In addition to seriousness in one’s 
disposition, this item could tap as well such 
attitudes as hostility and Pharisaic-virtue, or 
even indicate a general neurotic condition. 


Finally, a lack of uniformity is noted in 
the behavior of the items in Trait M (Mas- 
culinity). Item 175 in the M trait reads: 
“When a parent, teacher, or boss scolds you, 
you feel like weeping.” In addition to femi- 
ninity, this item very well could tap hyper- 
sensitivity or neuroticism in general. Con- 
sider the item: “You feel very badly if 
someone does not approve of what you are 
wearing.” Agreement is supposed to indi- 
cate femininity. Could it not indicate neu- 
rotic sensitivity as well? Agreement with 
these two items was a more frequent response 
among low MTATI scorers than among those 
with higher scores. The following items op- 
erate in the same way among the women 
teachers, further raising the question as to 
just what is motivating a “yes” or a “no” re- 
sponse: “You especially dislike to get your 
hands dirty or greasy.” “The sight of ragged 
or soiled fingernails is repulsive to you.” 
“You can handle a loaded gun without feel- 
ing at all jittery.” “You cry rather easily,” 
“The sight of an unshaven man disgusts 
you.” “When you become emotional you 
come to the point of tears.” When the low 
MTAI scorers agree to these statements, 
rather than expressing femininity primarily, 
it is quite possible that hostility, prudishness, 
and nervous instability represent the essen- 
tial motivating conditions. Among both men 
and women respondents, between 50% and 
75% of the M items discriminating in the 
direction of the correlation are of this nature. 

The same process appears to be operating 
among M items discriminating in the direc- 
tion opposite to that of the correlation as 
indicated in the following item: “You feel 
strongly against kissing a friend of your own 
sex and age.” Although agreement con- 
tributes to the Masculinity score, the low- 
scoring MTAI teachers agree more frequently 
than do the high scorers. 

This matter of item content and the psy- 
chological processes involved in item response 
could account at least partly for the dis- 
crepancy in the results of this investigation 
and the one by Cook and Medley (1) in 
which high MTAI scorers showed more femi- 
ninity, as measured by the Minnesota Multi- 
phasic Personality Inventory (MMPI). How- 


336 


ever, Cook and Medley (1) did find results 
similar to those in the present investigation 
when they compared MTAI scores with De- 
pression and Social Introversion as measured 
by the MMPI. 

Notwithstanding these comments concern- 
ing some of the GZTS items, correlational 
procedure and item analysis do indicate that 
the temperaments of teachers who have high 
rapport with pupils are characterized by per- 
sonal cooperativeness, friendliness, objectiv- 
ity, and emotional stability. Somewhat less 
obvious are the traits of sociability, ascend- 
ance, and masculinity. 


Temperament Profiles 


It is realized, of course, that personality 
structure is configurational and that a mere 
identification and summation of independ- 
ently conceived traits does not provide an 
adequate personality picture. With this in 
mind, a somewhat rough attempt was made 
to consider GZTS scores as a whole or pat- 
tern, The trait scores showing the closest 
relation to MTAI scores, namely BRO; 
and E, were converted to T scores, summed, 
and averaged into a composite score which 
was then correlated with MTAI scores, The 
resulting correlation was -52, actually no 
higher than that obtained for P alone. A 
multiple correlation possibly would have 
shown a somewhat higher relationship. 

Examination of individual GZTS Profile 
Charts for the highest and lowest 25% of 
teachers responding to the MTAI provided 
information supporting in general the find- 
ings thus far discussed. Sixty per cent of the 
75 teachers with high MTAT scores, as com- 
pared with only 16% of the low scorers, show 
relative elevations on the chart in traits 
EOFP and relative “dips” in Traits T and M 
(women subjects) .* Seventy-six per cent of 
the 75 low scorers on the MTAI, as compared 
with approximately 30% of the high scorers, 
give profile patterns with relative elevations 
on T and M (women subjects) and relative 
“dips” on one or more—usually more—of 
traits EOFP. 


1 Higher scores on the masculinity continuum are 
at the bottom of the chart for women and at the 
top for men. 


Carroll H. Leeds 


No discernible differences between the two 
groups were observed from the GRAS pro- 
files. Slightly over two-thirds of each group 
showed peaks on R with “dips” on A and S. 
One-sixth of each group indicated “dips” on 
R with rises on A and S. 


Comparison of Teacher Sample with Norms 


It is of interest to compare the ten tem- 
perament trait means of the present sample 
of 300 teachers with the norm means pre- 
sented in the Manual (5). With the excep- 
tion of Trait T (Thoughtfulness), the test 
norms are based upon scores obtained from 
523 college men and 389 college women. A 
number of veterans were included in the male 
sample. The norms for the T items are based 
on scores obtained from a group of high 
school seniors and their parents (N = 252). 

Results, presented in Table 3, show that 
differences between means of the teacher 
group and norm group are statistically sig- 
nificant at the 1% level for all traits except 
S, T, and M. The teachers’ mean scores are 
higher for traits R, E, O, F, and P and lower 
for traits G and A. 

On the basis, then, of results obtained in 
the present study, teachers as a group differ 
from college seniors in showing more re- 
straint and seriousness, greater emotional sta- 
bility and objectivity, more friendliness, and 
a more cooperative spirit in personal rela- 
tions. College seniors, on the other hand, 


Table 3 
Mean GZTS Scores for Teacher Group and 
Norm Group 
Teacher Norm 
Trait Group Group Diff. t 
G 15.8 17.0 —1.2 3.24" 
R 19.9 16.4 3.5 12:31" 
A 12.5 15.0 —2.5 7.24" 
S 19.2 18.8 0.4 1.05 
E 19.8 16.3 3.5 9.23** 
[0] 19.7 17.4 2.3 6.98"* 
F 19.5 14.6 49 15.21** 
T 17.7 18.2 —0.5 1.18 
P ZNSS 17.1 27 8.63** 
M (Men) 18.9 19.9 —1.0 1.50 
M (Women) 10.3 10.8 —0.5 1.60 


** Significant at 1% level, 


pts 


Teacher Attitudes and Temperament 337 


possess more drive and energy and show 
more social boldness. No difference is noted 
between the groups in sociability, reflective- 
ness, and masculinity. 


Received November 28, 1955. 


References 


1. Cook, W. W., & Medley, D. M. The relation- 
ship between Minnesota Teacher Attitude In- 
ventory scores and scores on certain scales of 
the Minnesota Multiphasic Personality Inven- 
tory. J. appl. Psychol., 1955, 39, 123-129. 


2. Cook, W. W., & Medley, D. M. Proposed hostil- 
ity and Pharisaic-virtue scales for the MMPI. 
J. appl. Psychol., 1954, 38, 414-418. 

3. Cook, W. W., Leeds, C. H., & Callis, R. Minne- 
sota Teacher Attitude Inventory. New York: 
Psychological Corporation, 1951. 

4. Guilford, J. P. Fundamental statistics in psy- 
chology and education. New York; McGraw- 
Hill, 1950. 

5. Guilford, J. P., & Zimmerman, W. S. The Guil- 
ford-Zimmerman Temperament Survey. Bev- 
erly Hills, Calif.: Sheridan Supply Co., 1949. 

6. Leeds, C. H. A scale for measuring teacher-pupil 
attitudes and teacher-pupil rapport. Psychol. 
Monogr., 1950, 64, No. 6 (Whole No. 312). 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Differential Interest Patterns in Salesmen * ° 


Arthur A. Witkin 
Queens College 


That salesmen of different types may have 
different interest patterns is evident from the 
work of Strong (2), Flemming (1), and 
others. Detailed knowledge of these patterns 
can contribute to the effectiveness of counsel- 
ing services for students, as well as to the 
effectiveness of business concerns in selecting 
salesmen. Three hundred successful sales- 
men, divided equally among specialty sales- 
men, route salesmen, and sales engineers, and 
drawn from 22 different companies, were 
studied for significant differences in interest 
patterns on the Strong Vocational Interest 
Blank For Men, 


Procedure 


The data for this study were derived from Strong 
Interest Blanks completed by salesmen as part of the 
Process of applying for a sales Position with one of 
the 22 companies involved. Before taking the Strong 
Test as one of a battery of tests, the applicants had 
been preliminarily screened by the use of application 
forms and personal interviews, 

The scores included in this study were for those 
salesmen who were later judged by their sales man- 
agers after a year or more of service to be effective 
salesmen. The Managers were asked this question: 
“Would you re-hire this man if he were a new ap- 
plicant and you knew as much about him as you 
now do?” A positive answer was the criterion of 
effectiveness, 

The scores studied were for the following nine 
scales of the Strong Blank: Sales Manager, Real 
Estate Salesman, Life Insurance Salesman, Produc- 
tion Manager, Personnel Manager, Accountant, Office 
Worker, Purchasing Agent, and Advertising Man. 
The six nonselling fields in the foregoing group were 
selected for inclusion in this study as representing 
a partial sampling of the kinds of customers and 
collateral activities with which salesmen become in- 
volved. Subsequent research will include such other 
interest scales as that of engineer, of which occupa- 
tion sales engineers are part. The statistical ap- 


1 This material is derived from a portion of the 
author’s doctoral dissertation completed at New 
York University. Appreciation is expressed to the 
author’s chief advisor, Dr. Brian E, Tomlinson. 

? Correspondence relating to the material in this 
article should be addressed to the author at 440 
East 20th Street, New York 9, N. Y. 


proach was the analysis of variance to determine 
differences among the three groups. 


Results 


An analysis of variance was undertaken for 
each of the nine Strong scales involved. At 
the .01 level of significance, differences among 
the three groups as detailed in Table 1 were 
found on the scales for production manager, 
accountant, office worker, purchasing agent, 
real estate salesman, and life insurance sales- 
man. At the .05 level, differences were found 
on the personnel scale. 

For the seven scales where significant dif- 
ferences were found, Table 1 indicates the re- 
sult of further F tests to determine where the 


Table 1 


Results of F Tests for Seven Strong Scales Taking the 
Three Sales Groups Two at a ‘Time 


Scale Fas Fac Fso 
Production Manager 1.42 24.29**  13.96** 
Personnel Manager .95 2.90 VEE K hsi 
Accountant 4.46* 4.34*  17.60** 
Office Worker 131922864 70.772% 
Purchasing Agent IUA 136 19.29** 
Real Estate Salesman 24 12.84**  9.59** 
Life Insurance Salesman .27 23.15% 28:42" 


* Significant at the .05 level, 
** Significant at the .01 level. 
Note.—A = Specialty salesmen, B = Route salesmen, C = 
Sales engineers. 


specific differences lie. Table 2 gives the 
mean for each group in terms of Strong stand- 
ard scores with the equivalent letter grade as 
commonly utilized in the Strong Test. 


Discussion 


Certain differentiating patterns emerge that 
may be helpful in distinguishing specialty 
salesmen, route salesmen, and sales engineers. 
Tt will be noted that sales engineers, as rep- 
resented by the sample, rate significantly 


338 


Interest Patterns in Salesmen 339 
Table 2 
Means for Each of the Three Sales Groups in Terms of Strong Standard Scores 
with Equivalent Letter Grades 
Means 
Specialty Salesmen Route Salesmen Sales Engineers 
Standard Letter Standard Letter Standard Letter 
Interest Scale Score Grade Score Grade Score Grade 
Production Manager 39 B 40 B Plus 45 B Plus 
$" Personnel Manager 43 B Plus 41 B Plus 45 A 
Accountant 36 B 39 B 33 B Minus 
Office Worker 45 A 50 A 39 B 
Purchasing Agent 40 B Plus 44 B Plus 39 B 
Sales Manager 52 A 52 A 50 A 
Real Estate Salesman 48 A 48 A 44 B Plus 
Life Insurance Salesman 49 A 50 A 43 B Plus 
Advertising Man 37 B 34 B Minus 35 B 


higher on the production manager scale than 
do specialty salesmen or route salesmen. 
There is no significant difference between 
specialty salesmen and route salesmen on 
this scale. 

That sales engineers should surpass the 
other two sales groups on the production 
manager scale probably reflects the fact that 
such salesmen deal with factory or plant 
personnel to a large extent and should share 
some interests with their customers if they 
are to understand and solve production prob- 
lems as a basis for sales. Frequently, both 
sales engineers and production managers 
share a common engineering training. 

On the personnel manager scale, the sales 
engineers sampled rated significantly higher 
than the route salesmen. This may reflect a 
greater liking for working with people and 
their problems in general which can enable 
the sales engineer to sustain his enthusiasm 
through the analysis of technical problems 
and during sometimes prolonged contact with 
customer personnel in the course of selling a 
single customer. 

On the accountant scale, route salesmen 
ranked highest, followed by specialty sales- 
men, with the sales engineers showing least 
interest in common with men in the account- 
ancy field. It should be noted, however, that 
even for the highest group, the route sales- 
men, the equivalent letter grade is only “B.” 


This suggests that salesmen as a group do 
not have strong interests in common with 
men handling computational data as in ac- 
counting, but that route salesmen may need 
to have more of this interest because of the 
record-keeping activities related to the many 
calls they must make each day. 

For the office worker scale, the three groups 
are also significantly different from each other, 
with the route salesmen again ranking high- 
est. This probably reflects the stock-check- 
ing bill-collecting, and order-taking activities 
that are a part of route selling. The sales en- 
gineers ranked lowest of the three groups for 
this factor. 

On the purchasing agent scale, the route 
salesmen were again the highest ranking, dif- 
fering significantly from both the specialty 
salesmen and the sales engineers. Thus, for 
the three interest scales, accountant, office 
worker, and purchasing agent, which are con- 
cerned with the handling of business detail, 
the route salesmen are shown to be signifi- 
cantly different in interest pattern. 

This predominance of the route salesmen 
on these factors may be a necessary accom- 
paniment of this type of selling in which a 
regular flow of distribution already exists. 
This is in contrast to the job of the specialty 
salesman who must create a desire to buy on 
the part of a prospect, and in contrast to the 
sales engineer’s job where some technical 


340 


problem may need to be solved for the cus- 
tomer before a sale can be made. 

On the real estate salesmen scale, all three 
groups had interest ratings of either “A” or 
“B Plus.” The sales engineers, however, 
ranked significantly lower on this scale than 
the other two groups. For the life insurance 
salesmen scale, a similar situation applies, 
with the sales engineers achieving a “B Plus” 
rating which was significantly lower than the 
“A” ratings achieved by the other two groups. 

This suggests that sales engineers as a 
group are likely to rank lower on those Strong 
interest scales dealing directly with interest 
in selling. This difference is probably ex- 
plainable in terms of the fact that sales en- 
gineers are generally technically trained men, 
often graduate engineers, who derive satis- 
faction from dealing with engineers and other 

_ technical men in customers’ organizations, 
and from solving customers’ technical prob- 
lems, as well as from the actual sales aspects 
of the job. 

In addition to the differentiating charac- 
teristics discussed above, certain common 
characteristics among the three groups are re- 
vealed, One of these is a relative lack of in- 
terest in common with advertising men as 
measured by the Strong advertising man’s 
scale. For the three groups, the mean Strong 
rating achieved is considered an indetermi- 
nate one. 

“B Plus” or “A” strength interest ratings 
are shared by the three groups in the areas 
of personnel manager, sales manager, real 
estate salesman, and life insurance salesman. 
While the three groups do have in common 
the fact that they obtained noteworthy rat- 
ings for these four scales, Table 2 also re- 
veals that for the personnel manager, real 
estate and life insurance salesman scales there 


Arthur A, Witkin 


were significant intergroup differences. These 
have been previously pointed out in the dis- 
cussion of intergroup differences. 


Summary and Conclusions 


Strong interest patterns of three types of 
salesmen were compared. Significant inter- 
group differences were revealed on seven of 
the interest scales studied for this sample of 
300 salesmen. In addition, certain common 
characteristics were also noted with regard to 
five of the interest scales. 

These findings suggest that while sales- 
men as a group may share certain interest 
factors, there are also differentiating aspects 
of their patterns that should be of value to 
school counselors dealing with students who 
are considering a sales occupation, and also 
to business organizations concerned with the 
selection of salesmen. 

This study points to certain tentative con- 
clusions as to the existence and nature of 
these differences as measured by several in- 
terest scales on the Strong Blank. These 
tentative conclusions are, of course, subject 
to confirmation by cross-validation studies 
based on additional independent samples. 

The results of this study support the trend 
away from the concept of salesmen in gen- 
eral toward the concept of special sales oc- 
cupational groups. 


Received February 16, 1956. 


References 


1. Flemming, E. G., & Flemming, C. W. Test se- 
lected salesmen. J. Marketing, 1946 (April), 
1-8. 

2. Strong, E. K., Jr. Vocational interests of men 
and women. Stanford Univer.: Stanford Uni- 
ver. Press, 1943, 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


The Prediction of Rifle Marksmanship 


E. F. MacCaslin and F. J. McGuigan? 


Human Resources Research Office, The George Washington University 


Gates (1) in 1918 tested the abilities of an 
expert marksman. 
and Humphreys, Buxton, and Taylor (2) 
have related steadiness measures to rifle 
marksmanship. It appears that no one, 
however, has attempted to predict rifle marks- 
manship from pretraining data. The present 
study attacks this problem and offers predic- 
tive data for Army recruits trained by two 
different methods, a part method and a whole 
method.’ 


Method 


Subjects. The experiment was conducted twice, 
each time at a different military installation. The 
first administration of the experiment used 68 Ss at 
Fort Knox, Kentucky; the second, 88 Ss at Fort 
Jackson, South Carolina, In both cases, the Ss 
were male, light infantry basic trainees, with “A” 
physical profiles. 

Procedure. The criterion data were obtained on 
an Army rifle range during four days of firing. 
Each S fired a total of 100 rounds in slow fire and 
72 rounds in sustained (rapid) fire at target dis- 
tances of from 100 to 500 yds. For each of the two 
training methods, the Wherry-Doolittle method was 
used to obtain the multiple R between each criterion 
(slow fire; sustained fire) and seven pretraining 
variables: (a) rifle steadiness, as measured by an 
ataxiameter (3). (b) firing experience, as scored with 


LF, J. McGuigan is now at Hollins College. The 
research reported here was conducted by the authors 
while they were employed by the Human Resources 
Research Office, The George Washington University, 
operating under contract with the Department of 
the Army, Opinions and conclusions are those of 
the writers, and do not necessarily represent views 
of the University or the Department of the Army. 
_ 2The rather high correlation coefficients reported 
in these studies were not found in a recent study by 
McGuigan and MacCaslin (3), who found the rela- 
tionship between rifle steadiness and marksmanship 
to be relatively low. i 
_ 3The data presented in this paper were obtained 
in the course of a larger study (see McGuigan, F. J. 
and MacCaslin, E. F. Whole and part methods in 
learning a perceptual motor skill. Amer. J. Psy- 
chol., 1955, 68, 658-661). In that study, the su- 
periority of the whole method over the part method 
was found to (a) be significant for slow fire for Ss 
of all levels of intelligence, and (b) approach sig- 
nificance for sustained fire for Ss of above-average 
intelligence. 


Spaeth and Dunham (4) 


a questionnaire,* (c) educational level, as defined by 
the number of years of schooling, (d) and (e) in- 
telligence, as measured by scores on the Armed 
Forces Qualification Test and by Aptitude Area I 
scores from the Army Classification Battery, (f) 
mechanical aptitude, as measured by the score on 
the Mechanical Aptitude Test of the Army Classifi- 
cation Battery, and (g) mechanical information, as 
measured by the score on the Shop Mechanics Test 
of the Army Classification Battery. The data for 
variables c-g were obtained from the trainees’ Army 
personal data files. Multiple Rs were also obtained 
by using only two of the predictor variables, intelli- 
gence and firing experience. 


Results 


Prediction of rifle marksmanship. The 
variables selected by the Wherry-Doolittle 
method were not selected consistently in each 
administration of the experiment. The fre- 
quency of selection of two variables, intelli- 
gence (Aptitude Area I score) and firing ex- 
perience, and the fact that most of the other 
variables correlated well with intelligence, 
suggested that these two variables be used 
throughout. Table 1 shows the Wherry- 
Doolittle and two-variable Rs obtained in the 
two processes showed no significant d'ifer- 
ences. 

Prediction of rifle marksmanship as a func- 
tion of training method. The two-variable 
Rs for each administration of the experiment 
were found to be not significantly different 
from each other and were averaged by means 
of Fisher’s z method. For slow fire, the mean 
two-variable Rs are .38 for the part method 
and .61 for the whole method. The differ- 
ence between these Rs approaches statistical 
significance at the 5% level. For sustained 
fire, the mean two-variable Rs are .32 for the 
part method and .67 for the whole method. 
These Rs differ significantly beyond the 1% 
level. Training by the whole method thus 
appears to give higher predictability for the 


4 The firing experience questionnaire developed for 
the purposes of this study took about 15 min. for 
group administration. 


341 


342 


Table 1 


Correlations Between Predictor Variables and 
Marksmanship Criteria 


Part Method Whole Method 
Sus- Sus- 
Slow tained Slow tained 
Replication Fire Fire Fire Fire 
Fort Knox 
Wherry-DoolittleR .44* 37 APA NY bòd 
Two-Variable R 40 37 DE E O 
N 33 33 35 35 
Fort Jackson 
Wherry-DoolittleR 33 27 P T he a E Arial 
Two-Variable R 36) .28 64**  .72** 
N 44 44 44 H 


* Significant beyond the 5% level. 
** Significant beyond the 1% level. 


pretraining variables studied here than train- 
ing by the part method does. 


Summary 


This study obtained multiple correlations 
showing the relationship between seven pre- 
training variables (rifle steadiness, firing ex- 
perience, educational level, two measures of 
intelligence, mechanical aptitude, and me- 


E. F. MacCaslin and F. J. McGuigan 


chanical information) and end-of-training 
marksmanship. It was found that two of the 
variables, intelligence and firing experience, 
predicted end-of-training marksmanship sub- 
stantially as well as all seven variables taken 
together. It was also found that higher pre- 
dictability was obtained by using the whole 
method than by using a part method. The 
average two-variable Rs for the whole method 
were .61 for slow fire and .67 for sustained 
(rapid) fire; for the part method, .38 for 
slow fire and .32 for sustained fire. 


Received December 5, 1955. 


References 


1, Gates, A. I. The abilities of an expert marks- 
man tested in the psychological laboratory. 
J. appl. Psychol., 1918, 2, 1-14. 

2. Humphreys, L. G., Buxton, C. E., & Taylor, H. R. 
Steadiness and rifle marksmanship. J. appl. 
Psychol., 1936, 20, 680-688. 

3. McGuigan, F. J., & MacCaslin, E. F. The rela- 
tionship between rifle steadiness and rifle 
marksmanship and the effect of rifle training 
on rifle steadiness. J. appl. Psychol., 1955, 
39, 156-159. 

4. Spaeth, R. A, & Dunham, G. C. The correla- 
tion between motor control and rifle shoot- 
ing. Amer. J. Psychol., 1921, 56, 249-256. 


l 
i, 
a 


4 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


A Criticism of Studies Comparing Item-Weighting Methods 


Robert L. McCornack 
Wayne University 


Recently two successive issues of this Jour- 
nal have contained two articles that perpetu- 
ate a commonly held misconception (3, 14). 
The type of study of concern is one in which 
two or more different weighting methods are 
applied to a set of tests or test items. Vari- 
ous characteristics of the keys thus produced 
are then compared. The characteristic of 
major concern here is validity. Studies of 
this type may be divided into two kinds: 
One kind actually compares the validity of 
the two differently weighted keys and is 
illustrated by Jurgensen (3), although many 
similar articles can be found (1, 6, 8, 11, 12). 
The second kind simply correlates the two 
keys and concludes from this very high cor- 
relation, say .98, that the two keys must 
necessarily have equivalent or nearly equiva- 
lent validities for any given criterion. This 
kind of study is illustrated by Trites and 
Sells (14) and others (e.g., 2, 4, 5, 7, 9, 10, 
3,15): 

Studies of the second kind make false con- 
clusions when they imply or explicitly con- 
clude from the typically high correlation be- 
tween two differently weighted composites 
that these composites must have equal validi- 
ties. For example, Trites and Sells state, “It 
may. be concluded, then, that in most in- 
stances there is little gained by use of frac- 
tional weights” (14, p. 454). In his classic 
article Stalnaker, similarly, concludes from a 
correlation of .99 that, “The relationship be- 
tween the weighted and unweighted scores is 
so high, so nearly perfect, that there is little 
justification for the use of weights with these 
examinations. . . . The influence of the usual 
weighting factors is so small as to be insig- 
nificant” (10, p. 490). Again, Webb says, 
“However, since the scores on the Likert scale 
obtained by the Likert and Thurstone scoring 
methods are highly correlated, one might ex- 
pect the Thurstone scale to possess a degree 
of validity approximate to that of the Likert 
scale” (15, p. 469). 


Studies of the first kind make proper con- 
clusions, since validities are actually com- 
pared. However, the same fallacy occasionally 
appears. For example, Jurgensen says, “Cor- 
relations between statistically determined and 
arbitrarily assigned weights were so high that 
they can be considered one and the same” 
(3, p. 307). Only Strong’s voice has been 
raised in opposition. “Our experience. . . 
shows that two systems of testing may cor- 
relate over .90 and have equally high reli- 
ability and yet one may have much higher 
validity than the other” (11, p. 70). 


The Proof 


The magnitude of the error in concluding 
equal validities for keys that correlate ex- 
tremely high will be displayed by consider- 
ing the simple case of two such keys and one 
criterion. Together with the requirement 
that the multiple correlation shall not exceed 
unity and the formula for multiple correla- 
tion in terms of Pearson rs, the following 
formula can be derived (e.g., 16, p. 280): 


rifiy E VI rè — ry? + nr’, 


where variables 1 and 2 are scores on the two 
keys and variable y is the criterion. This 
formula was used in constructing Table 1 
and expresses the limits of the validity of 
variable 2 in terms of the validity of vari- 
able 1 and the intercorrelation between 1 and 
2. The purpose of Table 1 is to show the 
possible differences in validities for highly 
correlated keys. While these limits are theo- 
retically possible, it should be noted that 
they would be unlikely of attainment in prac- 
tice. Most of the studies of the first kind 
noted above illustrate this. The correlations 
chosen were selected on the basis that they 
represented typical validities and were typi- 
cal of the correlations found between two 
keys used to score the same tests. 

Taking the values from Jurgensen’s re- 
search, the use of the table may be illus- 


343 


344 


Table 1 


Maximum and Minimum Validities of One Key, Given 
the Validity of the Other Key and the 
Correlation Between the Keys 


Key Validity of Other Key 
Inter- 
correlation .30 40 -50 -60 

-998 24-36* 34-46 44-55 55-65 
996 21-38 32-48 42-58 53-67 
994 1940 30-50 40-59 51-68 
992 18-42 28-51 39-61 49-70 
990 16-43 27-53 37-62 48-71 
.980 1048 21-57 32—66 43-75 
960 02-56 13-64 24-72 34-82 
940 —04-61 06-69 17-77 29-84 
920 —10-65 01-73 12-80 24-87 
.900 —15-69 —04-76 07-83 19-89 


eee points have been omitted from the body of the 
able, 


trated. Two keys correlate .996, if one key 
correlated with the criterion .40, then it 
would be conceivable for the other key to 
correlate with the criterion .48 or .32. In 
the opinion of the writer the difference be- 
tween 40 and .48 might well be of practical 
importance in an actual prediction problem. 
When the correlation between the scores on 
the two keys is lower the possible difference 
between the two validities increases very 
rapidly. For example, if the keys correlate 
-98 one key could have a validity of .40 and 
the other a validity as high as .57 or as low 
as .21. For a correlation of .92, the validity 
of one key could be .40 and the other as high 
as .73 or as low as .01! 

Table 1 has other applications. Much of 
the research on the speed vs. power issue can 
be similarly criticized. The common practice 
of “validating” a new test by correlating it 
with an already extensively validated test 
fallaciously implies that the new test will 
have similar validities. In general, psycholo- 
gists seem to have been rather unduly im- 
pressed with high correlations. 


Conclusion 


The moral is plain. One may use two dif- 
ferent sets of item weights to score the same 
group of tests and find an exceedingly high 
correlation between the scores thus produced. 
Still, as long as the correlation is not 1.00, it 
is possible to find that the validities of the 


Robert L. McCornack 


two keys differ to a statistically significant 
and practically important degree. The no- 
tion that just because two keys correlate very 
highly they may be used interchangeably for 
any purpose is false. 


Received December 30, 1955. 


References 


1. Guilford, J. P., Lovell, C., & Williams, R. M. 
Completely weighted versus unweighted scor- 
ing in an achievement examination. Educ. 
psychol. Measmt, 1942, 2, 15-21. 

2. Harper, B. P., & Dunlap, J. W. Derivation and 
application of a unit scoring system for the 
Strong Vocational Interest Blank for women, 
Psychometrika, 1942, 7, 289-295. 

3. Jurgensen, C. E. Item weights in employee rat- 
ing scales. J. appl. Psychol., 1955, 39, 305- 
307. 

4. Kogan, L., & Gehlmann, F. Validation of the 
simpliñed method for scoring the Strong Vo- 
cational Interest Blank for men. J. educ. 
Psychol., 1942, 33, 317-320. 

5. Lester, H., & Traxler, A. E. Simplified method 
for scoring the Strong Vocational Interest 
Blank applied to a secondary-school group. 
J. educ. Psychol., 1942, 33, 629-631. 

6. Perry, D. K. Forced-choice vs. L-I-D response 
items in vocational interest measurement. J. 
appl. Psychol., 1955, 39, 256-262. 

7. Peterson, B. M., & Dunlap, J. W. A simplified 
method for scoring the Strong Vocational In- 
terest Blank. J. consult. Psychol., 1941, 5, 
269-274. 

8. Phillips, A. J. Further evidence regarding 
weighted versus unweighted scoring of ex- 
aminations. Educ. psychol. Measmt, 1943, 3, 
151-155. . 

9. Potthoff, E. F., & Barnett, N. E. A comparison 
of marks based upon weighted and unweighted 
items in a new-type examination. J. educ. 
Psychol., 1932, 23, 92-98. 

10. Stalnaker, J. M. Weighting questions in the 
essay-type examination. J. educ. Psychol, 
1938, 29, 481-490, 

11. Strong, E. K., Jr. Procedure for scoring an in- 
terest test. Psychol. Clinic, 1930, 19, 63-72. 

12. Strong, E. K., Jr. Weighted vs. unit scales. J. 
educ. Psychol., 1945, 36, 193-216. 

13. Strong, E. K., Jr, & Carter, H. D. Efficiency 
plus economy in scoring an interest test. J. 
educ. Psychol., 1935, 26, 579-586. 

14. Trites, D. K., & Sells, S. B. A note on alterna- 
tive methods for estimating factor scores. J. 
appl. Psychol., 1955, 39, 455-456. 

15. Webb, S. C. A generalized scale for measuring 
interest in science subjects. Educ. psychol. 
Measmt, 1951, 11, 456-469. 

16. Yule, G. U., & Kendall, M. G. An introduction 
to the theory of statistics. London: Griffin, 
1940, 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Leadership Opinions as Forecasts of Supervisory Success 


Bernard M. Bass 


Louisiana State University 


Fleishman (3) described the development 
by the Ohio State Leadership Studies of a 
measure of leadership attitudes in industry. 
The Leadership Opinion Questionnaire yields 
two scores, Consideration and Initiating Struc- 
ture, with respective estimated reliabilities of 
.70 and .79 and an intercorrelation of — .01. 
A foreman with a high Initiating Structure 
score would favor assigning people in the 
work group to particular tasks, criticizing 
poor work, and emphasizing the meeting of 
deadlines. A foreman with a high Consid- 
eration score would emphasize getting the 
approval of the work group on important 
matters before going ahead; he would stress 
willingness to make changes and the doing of 
personal favors for people in the work group 
(1). 

Fleishman applied the scales to evaluating 
the effects of a leadership training program 
among International Harvester supervisors. 
He noted that foremen who operated under 
more considerate “climates” described them- 
selves as more considerate. A more consid- 
erate climate was one where the foremen be- 
lieved they had more considerate bosses, and 
the bosses wanted their foremen to be more 
considerate. Low positive correlations also 
existed between the foremen’s attitudes of 
consideration and initiation, their bosses’ be- 
havior, and the foremen’s estimates of what 
was expected by the bosses (2). 

Labor grievances were lower where the 
foremen’s bosses expected foremen to be con- 
siderate, where foremen were described so, 
and to a lesser extent where the foremen per- 
ceived such expectations by their bosses (1). 

These results led to the expectation that 
the Leadership Opinion Questionnaire could 
be used to forecast success as a supervisor in 
a company which has been a leading exponent 
of progressive personnel practices in recent 
years and where heavy emphasis has been 
placed on the value of the individual worker. 
(High capital investment and relatively low 


labor costs of production make such an en- 
lightened attitude both sensible and possible.) 
It was hypothesized that supervisors in this 
setting, who held more favorable attitudes 
toward consideration as a mode of leadership 
behavior, would be rated more highly by 
their supervisors. 


Method 


Seventy-seven supervisors, most of whom were at 
the lowest or second lowest level in the management 
hierarchy of a petrochemical refinery, were adminis- 
tered the Leadership Opinion Questionnaire in which 
they indicated what they, as supervisors, ought to 
do, not what they actually did do. The Initiating 
Structure and Consideration scores obtained from 
the questionnaire were correlated with forced-choice 
performance reports collected for 53 of these super- 
visors approximately two years later. These per- 
formance reports by superiors had been found to 
discriminate validly (r=.62 to .84) among super- 
visors voted high, medium, and low, by pooled judg- 
ments (4). Odd-even and equivalent form reliabili- 
ties were above .90. 


Results 


In line with expectations, a correlation of 
.29, significant at the 5% level for 51 df, 
was obtained between Consideration and the 
forced-choice performance report two years 
later. At the same time, a correlation of only 
— .09 was found between rated success as a 
supervisor and attitudes favoring Initiating 
Structure. Thus, the extent supervisors con- 
formed verbally in attitude to the company 
Zeitgeist forecast their rated success as su- 
pervisors two years later. 

On the one hand, the obtained correlation 
of .29 between Consideration and future suc- 
cess as a supervisor is probably too low by 
itself for practical significance as a predictor 
of supervisory success; on the other hand, 
the subjects, already employed supervisors, 
were undoubtedly more homogeneous in train- 
ing, attitude, and ability than candidates or 
applicants. A higher correlation would be 
expected in a more heterogeneous sample. 


345 


346 


The Leadership Opinion Questionnaire may 
provide a valuable addition to a supervisory 
selection battery in organizations emphasiz- 
ing the need for supervisors to be considerate. 


Summary 


The Leadership Opinion Questionnaire was 
administered to supervisors of a firm noted 
for its emphasis on progressive personnel re- 
lations and interest in the welfare of the in- 
dividual employee. A correlation of .29 was 
found between the extent to which a super- 
visor believed he ought to be considerate of 
his subordinates and the extent to which he 
was rated a successful supervisor by his su- 
periors two years later. No consistent rela- 


Bernard M. Bass 


tion was found between favoring Initiation 
of Structure and rated success. 


Received December 5, 1955. 


References 


1. Fleishman, E. Leadership climate and supervisory 
behavior. Personnel Res. Bd., Ohio State 
Univer., 1951. 

2. Fleishman, E. Leadership climate, human rela- 
tions training and supervisory behavior. Per- 
sonnel Psychol, 1953, 6, 205-222. 

3. Fleishman, E. The measurement of leadership 
attitudes in industry. J. appl. Psychol., 1953, 
37, 153-158. 

4. Standard Oil Company. Made to measure. Baton 
Rouge, La.: Standard Oil Co. (N. J.), Em- 
ployee Relat. Dep., 1951. 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


Validation of an Attitude Scale as a Device for Predicting 
Behavior 


Peter A. Holman 
Guided Missile Division, Firestone Tire and Rubber Company, Los Angeles 


The scarcity of reports on the validation of 
attitude measures in terms of their ability to 
predict individual respondent behavior has 
been cited by McNemar (8), Campbell and 
Katona (2), and Blankenship (1), as well as 
others. In one of the few reported studies, a 
correlation of .024 was found between an 
attitude scale on cheating and actual cheat- 
ing behavior by a class of college students 
(3). Validity of respondent answers has 
most often been checked by comparing an- 
swers with records (6) or evidence of recent 
behavior (7). 

The study described in this paper was an 
exploratory design for measuring the validity 
of individual prediction of behavior over vari- 
ous periods of time and the predictive va- 
lidity of a related attitude scale. 

The study required (a) frequent access to 
the sample and (b) a behavioral criterion 
which was well defined. The sample selected 
consisted of college students. The behavioral 
criterion selected was attendance at college 
football games, 

In the fall of 1950, a week prior to the 
football season, a questionnaire was adminis- 
tered to 253 students at the University of 
Southern California. The questionnaire listed 
the University’s football games; the students 
indicated whether they would or would not 
attend each game, or were doubtful. The 
questionnaire also contained 12 questions re- 
lated to attitude toward football games. A 
sample question and its answer categories 
follows: 

“How would you describe attendance at 
football games? (check one) 

“Very worthwhile ——; Worthwhile ——; 
Not very worthwhile ——; Worthless xii 
Each student signed his questionnaire. Scor- 
ing weights for the answer categories ranged 
from 1 through 4, in order from least to 
most favorable. Answer-category scores were 
summed to yield scale scores. 


On each Monday following a game, the 
students present in class (varied from 193 
to 234) were given a brief form on which 
they checked whether they had or had not 
attended the game. The students also signed 
these forms. (It is assumed in this study 
that the validity of reported attendance was 
high and consistent from game to game.) 
The possibility that students biased their 
postgame reports in accordance with their 
preseason prediction is considered negligible. 
The students were not aware of the purpose 
of the study; those who made guesses thought 
the study related to the effect of televised 
games on game attendance. 

Since the students signed both the presea- 
son survey. and the follow-up surveys, it was 
possible to tabulate for each student his pre- 
dictions, scale score, and subsequent behavior 
(5): 

The attitude scale had a possible score 
range of 12 to 48. The obtained range was 
27 to 47, the median 39.5, the mean 38.4, 
the standard deviation 3.7. The mean for 
those predicting attendance at the first game 
was 42.7; for those predicting nonattendance, 
36.5. The median was used as a critical 
score for computation of attitude scale cor- 
relations. The correlation used is the phi co- 
efficient, corrected for restriction in size (phi/ 
phi max.) (4, p. 433). All correlations were 
statistically significant (p > .01). 

Attendance at all games was less than pre- 
dicted (Table 1). Increased error with re- 
moteness in time is shown for those predict- 
ing attendance; predictions of nonattendance 
were much more valid. The “doubtful” 
group in general attended games in larger 
proportions than those predicting nonattend- 
ance, but not in proportion to original group 
prediction, subsequent group attendance, or 
any other proportion revealed by the data. 

The increase in error of prediction with 
remoteness in time is again shown in Table 2. 


347 


348 Peter A. Holman 
Table 1 
Preseason Prediction and Subsequent Attendance 
Game 
Category First Second Third Fourth Fifth Sixth Seventh 
Size of sample 231 234 210 196 193 199 205 
Predicting attendance (%) 66 74 65 33" 62 80 76 
Attending (%) 65 61 63 25 38** 51 56 
“Will attend” attending (%) 96 80 72 60 51 55 62 
“Will not attend” attending (%) 7 12 5 4 12 0 10 
“Doubtful” attending (%) 21 7 21 13 13 0 19 


* An “‘Away" game customarily attended by many students. 


** Rain. 


The error increased sharply between the 
fourth and fifth game (p > .01). 

The scale score correlations between pre- 
diction and between behavior showed an in- 
verse relationship, indicating the fallibility of 
prediction as a criterion for attitude scale 
validity. (The correlations indicate that atti- 
tude was a stronger determiner of attendance 
at the fourth (away) game than any of the 
other games. A contrary conclusion would 
have been drawn if prediction were the 
criterion.) The correlations between scale 
score and prediction tended to increase over 
the season and in general were inverse to the 
trend of the correlations betwen student pre- 
diction and attendance, The correlation be- 
tween scale score and total games attended 
was higher (p> .01) than the correlation 
between scale score and total game attend- 
ance predicted. 

Observation indicated that “football atti- 
tude” was not the only factor determining 
game attendance, Some students worked 


Saturdays; for some students game attend- 
ance was a social outing. If these factors 
had been considered in advance, the initial 
survey could have determined which students 
worked Saturdays, and a “social attitude” 
scale could have been added to the question- 
naire. Knowledge of these factors might have 
made possible a better test of “football atti- 
tude” relationship to attendance. 

As a test of the possible predictive validity 
of the scale under such conditions, the writer 
predicted that (a) those students with high 
scale scores predicting attendance would at- 
tend, and (b) those students with low scale 
scores predicting nonattendance would not at- 
tend. The correlations between prediction 
and attendance for these two groups, shown 
in the last row of Table 2, are higher (p > 
.01) than the correlations between student 
prediction and attendance (first row of Table 
2) for all except the second game. 

The data suggested that attitude scale 
scores could be used to make a more accu- 


Table 2 
Correlation (Phi) Between Prediction and Subsequent Attendance 


Game 
E - Total 
Variables First Second Third Fourth Fifth Sixth Seventh Games 
Prediction and attendance (S)* se Sia, LI eee CE OAOE n U4 46 80 
Scale score and prediction 39 53 64 34 68 61 59 33 
Scale score and attendance 49 34 39 59 27 21 36 Al 
Prediction and attendance (Z)** 99 -16 95 92 57 78 91 96 


* Preseason prediction by students (does not include doubtful category). 


** Prediction by author for scale-selected groups. 


. he 


Validation of an Attitude Scale 349 
Table 3 
Percentage-Point Error* in Prediction 
Game 
K 5 Average 
Predictor First Second Third Fourth Fifth Sixth Seventh Error 
Students +1 +13 +12 +8 +24 +29 +20 15 
E using 50:50 split of “dubious” =3 +11 +14 “£12 =8 —8 =6 9 
E using ratio split of “dubious” -7 0 +7 +19 —4 —6 —6 7 


* (Predicted attendance %) — (attendance %) = percentage-point error, 


rate prediction of behavior than the predic- 
tion of the sample itself: a prediction of at- 
tendance was made for students with high 
scale scores predicting attendance. A pre- 
diction of nonattendance was made for stu- 
dents predicting nonattendance, regardless of 
scale scores. Predictions for the remaining 
students (dubious group) were made in two 
ways: (a) they would attend in ratio to the 
other groups, and (b) half would attend. 
Predictions of attendance were then summed 
to yield total predicted attendance. 

Splitting the dubious group in ratio to the 
“predictable” groups enabled substantially 
more accurate prediction (than was made by 
_ the students themselves) for the last three 

games (Table 3). Splitting the dubious group 
50:50 enabled more accurate prediction in 
five of seven games and a much smaller over- 
all season error. 


Summary 


The study showed that (for the sample and 
behavioral area studied) although predictions 
of future behavior were not highly valid and 
the predictive validity of the attitude scale 
used was less high, a high degree of predictive 
validity might be secured by grouping indi- 
viduals into categories determined by both 
attitude-scale scores and individual predic- 
tions. 

The study also indicated that successful 
validation of an attitude scale with behavioral 
criterion requires that the attitude measured 
be the primary factor affecting behavior. 
When other variables may affect behavior, 


additional data should be secured and the 
sample should be fractionated into at least 
three categories: (a) those who have free- 
dom of choice and who are not primarily 
motivated toward the behavioral criterion by 
factors other than the attitude under study; 
(b) those with a positive attitude but who 
are restricted in their behavior by other fac- 
tors; (c) those with a relatively negative atti- 
tude but who may respond positively to the 
behavioral criterion because of factors or atti- 
tudes other than the attitude under study. 


Received February 13, 1956. 
Early Publication. 


References 


1. Blankenship, A. B. Consumer and opinion re- 
search. New York: Harper, 1943. 

2, Campbell, A. A., & Katona, G. The sample sur- 
vey: a technique for social science research. 
In L, Festinger & D. Katz (Eds.), Research 
methods in the behavioral sciences. New 
York: Dryden, 1953. 

3. Corey, S. M. Professed attitudes and actual be- 
havior. J. educ. Psychol., 1937, 28, 271-280. 

4. Guilford, J. P. Psychometric methods, (2nd 
Ed.) New York: McGraw-Hill, 1954. 

5. Holman, P. A. Validity of the individual re- 
sponse in public opinion measurement. Un- 
published master’s thesis, Univer. of Southern 
California, 1956. 

6, Hyman, H, Do they tell the truth? Publ. Opin. 
Quart., 1944, 8, 557-559. 

7. Link, H. G., & Freiberg, A. D. The problems of 
validity vs. reliability in public opinion polls. 
Publ. Opin. Quart., 1942, 6, 87-98. 

8. McNemar, Q. Opinion-attitude methodology. 
Psychol. Bull., 1946, 43, 289-374. 


The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 


A Note on the Spanish Language Form of the Oral 
Directions Test of Intelligence 


Victor D. Sanua 


Psychiatric Services Division, Institute of Physical Medicine and Rehabilitation, 
New York University—Bellevue Medical Center 


Nearly 700,000 persons of Puerto Rican 
origin are now residents in the U. S. A. with 
the greatest concentration in the New York 
Metropolitan area. Only a small percentage 
of this large and rapidly increasing segment 
of the population is able to speak English 
fluently. 

The impact of large groups of Spanish- 
speaking children on metropolitan schools has 
been widely recognized, and strenuous efforts 
to understand the cultural and intellectual 
factors involved in adapting the school pro- 
gram to the language problem are now under 
way. A more diffuse problem is presented by 
the adult members of the group. Communi- 
cation difficulties are particularly serious in 
working toward the most productive eco- 
nomic utilization of this new element in the 
available labor force, 

On the practical level of individual job de- 
cisions, the problem becomes obvious at the 
point of evaluating an applicant’s intellectual 
abilities, job level classification, and poten- 
tial for job training. Suitable testing pro- 
cedures for this necessary process with Span- 
ish-speaking Americans would be useful to 
vocational guidance agencies, employers, and 
others whose work effectiveness is blocked by 
the communication problem. 

As a Spanish-speaking psychologist in a re- 
habilitation center, the writer had the oppor- 
tunity to administer the Wechsler-Bellevue 
Test to Puerto Rican Americans who had in- 
curred a disability following a work accident. 
The patients were referred to the Institute of 
Physical Medicine and Rehabilitation at New 
York University for vocational advisement 
supplementary to medical treatment. The 
use of experimental Spanish forms of Wech- 
sler scales with Puerto Rican adults proved 
of small value. Even if an official version of 
the scales becomes available, it is doubtful 
whether a long, individually administered test 


would satisfy the major needs for estimating 
the intelligence of Puerto Rican adults. There 
are very few Spanish-speaking psychologists, 
and, of course, the Wechsler scales are too 
time consuming to be used for screening pur- 
poses in industrial selection or for quick clas- 
sification in guidance agencies. The prac- 
tical situation requires a short group test 
which can be administered to adults who 
speak little or no English. There should be 
no requirement for professional training or 
special language skills in the test administra- 
tion. It is obviously unrealistic to demand 
the services of Spanish-speaking psychologists 
in coping with the huge volume of guidance 
and selection problems that are arising. 

The Oral Directions Test, which was origi- 
nally developed to provide an intelligence 
score for job applicants covering the widest 
practical range of education and ability, has 
been issued in a Spanish-language version for 
specific use in screening job applicants at a 
large refinery in South America. The entire 
test, including all directions in the Spanish 
language, is orally administered by use of a 
15-minute magnetic tape recording. The test 
is simple and practical and can be used for 
individual or group administration. The test 
administrator requires no special training 
other than knowing how to manage a group 
testing situation. The timing of the test is 
automatically standardized by the recorded 
presentation of all instructions and test items. 

The test is readily available and appears to 
meet the ideal requirements for many press- 
ing applications in screening and classifying 
non-English-speaking Puerto Rican Ameri- 
cans. The population of interest is known 
in advance to cover a wide range of ages and 
ability, and to include large numbers of adults 
of limited education and vocational experi- 
ence. The major uncertainty relative to the 
use of the Spanish version of the test with 


350 


Spanish Form of Oral Directions Test 


Puerto Rican groups lies in thé question 
whether a translation suited for Spanish- 
speaking Venezuelan workers would be fully 
comprehensible to Spanish-speaking Puerto 
Ricans. It is possible that linguistic and dia- 
lectic differences between the cultures would 
invalidate the use of the test in a different, 
distant, and isolated Spanish-speaking region. 

During a recent visit (1956) to San Juan, 
Puerto Rico, the writer had an opportunity 
to investigate the suitability of the existing 
Spanish form with Puerto Rican groups. The 
recorded version was first reproduced in the 
presence of Puerto Rican psychologists on the 
staff of the Veterans Administration in San 
Juan and a group of language teachers em- 
ployed in Puerto Rican educational institu- 
tions. The consensus clearly indicated that 
the recorded version contained no linguistic 
problems unsuited for the Puerto Rican popu- 
lation. Though there are vocabulary, usage, 
and idiomatic variants peculiar to the widely 
separated Spanish-speaking populations of 
Central and South America, the simplicity of 
language in the test avoided the problem. 
Language comprehension is unquestionably a 
factor in the test performance of individuals, 
but the user of the Spanish version of the 
Oral Directions Test with Puerto Rican 
groups can be confident that any such fac- 
tors present in the scores properly reflect the 
comprehension ability of the individual. The 
scores are not invalidated by the use of inap- 
propriate linguistic elements in the directions 
or the items of the test. 

The utility of the test was evaluated by 
trial in three groups of students in Puerto 
Rican schools. The first group were young 
adults attending evening classes of the school 
system of San Juan. Most of them belonged 
to the working class and were pursuing regu- 
lar school studies. At the time of the testing 
they had reached the fifth or sixth grade cur- 
ricular level of education. The median score 
of this group (11 points) is close to the me- 
dian score (12 points) of the 1,281 laborers 
tested in Venezuela. The second group were 
trainees in aviation mechanics attending the 
Michel Such Metropolitan School of San Juan, 
one of the world’s largest vocational schools. 


One of the requirements for admission to this 


351 


Table 1 
Oral Directions Test, Spanish Language Form 


Frequency Distribution of Scores Obtained by Young 
Adult Students in San Juan, Puerto Rico 


Group 1—33 men and women enrolled in evening classes. 
6th grade level. Ages 17-43. 

Group 2—20 male high school graduates studying avia- 
tion mechanics. Ages 17-21. 

Group 3—44 eleventh grade boys and girls in a univer- 
sity high school. Ages 16-19. 


Score Group 2 Group 3 


38-39* 
36-37 
34-35 
32-33 
30-31 
28-29 
26-27 
24-25 
22-23 
20-21 
18-19 
16-17 
14-15 
12-13 
10-11 1 


Group 1 


BOR wwe 
RB RAAMOW IND 


ROR RO RE 


28 32 


Median 11 


* Maximum possible score = 39, 


program is a high school education. As ex- 
pected, because of the selective admission, 
this group obtained higher scores (median = 
28) than the evening school class. Forty-four 
girls and boys comprising the eleventh-grade 
students attending the Escuela Superior de 
la Universidad formed the third group, The 
school is attached to the Faculty of Educa- 
tion of the University of Puerto Rico. Most 
of these students are children of university 
personnel including academic appointees, of- 
fice staff, and maintenance workers. The me- 
dian score of this select group was 32. There 
were no difficulties in administering the test. 
The sound reproduction was clear. The man- 
agement of the materials was easy, and the 
students enjoyed the experience. 


352 


Table 1 presents details of the frequency 
distributions of the three groups. The two 
extreme groups are sharply discriminated, and 
the vocational. school group exhibits}the ex- 
pected wide range and overlap. It is appar- 
ent that the test effectively covers a very wide 
range of ability. 

In view of the present problem of testing 
Puerto Ricans in New York City and else- 
where, the Spanish form of the Oral Direc- 
tions Tést should prove useful and practi 
in settings where an intelligence score is 


+ é 
ý 
Mi "H 
ë 
* 
* 
* b 


Victor D. Sanua 


xë 


needed for screening, guidance, or placement 
in training. j: 


Received May 16, 1956. 
Early Publication. 


Reference 


1, Langmuir, C. R. Test de instrucciones orales, a ` 
Spanish edition of the PTI-Oral Directions 
Test. (Translation and adaptation by David 
Cook, Creole Petroleum Corporation, Caracas, * 
Venezuela.) New York: The Psychological 
Corporation, 1955. 


. 
` Ea . 
> 
» 
ge 
s é 
a 


Journal of Applied Psychology 


VoL. 40, No. 6 


DECEMBER, 1956 


A Test of the Effects of Pregnenolone Methyl Ether on Subjective 
Feelings of B-29 Crews After a Twelve-Hour Mission * 


Saul B. Sells, John R. Barry, David K. Trites, and Herman I. Chinn ? 
Air University, USAF School of Aviation Medicine, Randolph Field 


Pregnenolone methyl ether (PME) has 


been reported by Huffman and his associates 
(1, 5) to produce favorable effects on psychi- 
atric patients in mitigating subjective reac- 
tions of fatigue, irritability, anxiety, and fear. 
Campbell et al. state that (with doses vary- 
ing from 125 to 250 mg. given from one to 
four times daily), 


The patients with fair consistency reported an al- 
most immediate feeling of relaxation following the 
administration of a therapeutic dose. They reported 
that they could sleep with greater ease and that they 
felt more rested on the following day. There did 
not appear to be any significant over-accentuations 
of euphoria such as does occur with the use of 
amphetamine. A most interesting observation was 
that cases combining severe depression symptoms 
with insomnia showed a considerable relief from 
both, and we have encountered no other single 
medication which has produced this result (1). 


More recently Sleeper (5) reported on a 


| series of 150 private patients, over a period 


1The cooperation and assistance of many persons 
made this research possible. Appreciation is ex- 
Pressed to Colonel Colin E. Anderson, Commander, 
3510th Flying Training Group, for his permission to 
carry out the study with student crews and for his 
wholehearted encouragement. Captain Emil Chapla 
and. Major Ben Weeks of the 3510th Flying Train- 
tug Group assisted substantially by scheduling the 
briefings and test sessions and advising on adminis- 
trative arrangements. Major Joseph Quashnock, De- 
partment of Flight Medicine, School of Aviation 
Medicine, USAF, provided medical consultation and 
attended all testing sessions. The following personnel 
of the School of Aviation Medicine assisted on tech- 
nical phases of the study: Major T. C. Kahn, Cap- 
tain.M. R. Seaquist, Dr. Albert Kubala, T/Sgt. 
Thomas Putnam, S/Sgt. Charles F. Eckel, A/1C 
Robert Laves: A/1C George L. Sheldon, A/1C Gary 
Walkup, A“ \ Wayne Fowler, and A/2C Raphael 
Dondero, A 

2 The first three authors are in the Department of 
Clinical Psychology, the fourth in the Department 


bàof Biochemistry and Pharmacology. 
F 353 


of almost two years, who were given daily 
divided dosage levels of 150 to 500 mgm. 
in a digestible oil solution of 25 mgm./cc. 
strength. He, too, found that the only fre- 
quent alteration in patients taking this steroid 
was a decrease in irritability and anxiety. 
This change seemed to occur within a limit 
characteristic of each patient, and increasing 
the dosage level produced no great change 
after the limit of improvement was reached. 
The two patient groups for which Sleeper 
found most favorable results were involu- 
tional patients with anxiety and depression, 
and psychoneurotic patients who had become 
stable with passage of time, but remained 
uncomfortable, No significant side effects, 
except an occasional mild rash or mild nausea, 
possibly due to the oil carrier, were found. 

Operational flying missions of long dura- 
tion, with concomitant exposure to hazard, 
physical discomfort, and deprivation, fre- 
quently involve subjective emotional reactions 
which may cause or be accompanied by re- 
duced capacity for accurate efficient perform- 
ance. If efficiency could be increased by 
medication which altered neurophysiological 
balances and thus mitigated undesirable sub- 
jective reactions, it would be important to the 
Air Force. 

The experiment described below was un- 
dertaken to determine whether similar effects 
might be produced in normal bomber crew 
personnel, under realistic operational flying 
conditions. The tests were arranged in con- 
junction with a long overwater training mis- 
sion involving between 15 and 18 hours of 
continuous activity, 12 hours of which were 
in flight. The drug was administered after 


354 


the completion of the initial psychological 
testing about an hour after landing. The ef- 
fects of the drug were assessed later after the 
crews unloaded and checked their aircraft 
and completed a postflight inspection and 
critique. 


Method 


Plan of the Experiment 


Eight student B-29 crews and their instructor 
crews volunteered to serve as subjects in this ex- 
periment. Each crew, including instructors, was nor- 
mally composed of 15 individuals, The effects of 
the drug were tested in relation to the anxiety, fa- 
tigue, and irritability incident to a long, overwater 
training mission involving navigation, bombing, and 
gunnery problems, which is considered by the train- 
ing group to be extremely tiring and stressful. 

Since the side effects of the drug were unknown, 
it was decided to conduct the first test on the 
ground, at the end of the mission, when the effects 
of the long flight and loss of sleep would be height- 
ened and while several terminal activities integral to 
the completion of the mission were yet to be per- 
formed. This procedure provided a good situation 
for the experiment, involving realism in the use of 
operational personnel, and required only 90 minutes 
of additional time for drug administration and psy- 
chological testing. 

-The entire group of 120 crew members and in- 
structors was sorted into two subgroups according 
to a plan which provided for equal representation in 
each subgroup of each crew and each crew position. 
Subgroup E, designated as the experimental group, 
was administered the drug. Subgroup C, designated 
as the control group, was administered a placebo. 
The subjects were told that they might receive dif- 
ferent dosages, but no subject knew whether he was 
being given the drug or the placebo. Two hours 
after take-off, crew 2 was forced to abort the mis- 
sion and returned to base, An unexpected variation 
in the size of the crews (13 to 16 men instead of 
15), together with the loss of this crew and the fail- 
ure of three instructors to appear for testing, re- 
duced the final sample for the experiment to 50 
control and 51 experimental subjects. Subsequently, 
it was found that all of the test results were invalid 
for one control subject who had not followed the 
instructions, and that for one test two experimental 
subjects had invalid scores, 

The eight crews took off at 15-minute intervals, 
beginning at 2010 hours on April 26, 1954. Prior to 
take-off each crew had been at the flight line for 
three hours for briefing, loading, and preflight in- 
spection. The seven crews completing the mission 
landed the next morning between 0730 and 0930 
hours. 

As soon as each crew landed, it taxied the aircraft 
to a parking area and reported directly to an as- 


Saul B. Sells, John R. Barry, David K. Trites, and Herman I. Chinn 


signed briefing room where a 45-minute battery of 
psychological tests was administered. Upon comple- 
tion of the testing, each crew member was given 
either the drug or the placebo. Smoking was per- 
mitted, but none of the subjects was permitted to 
eat until after the experiment was concluded. 

After administration of the drug, the crews re- 
turned to their aircraft for unloading and a post- 
flight inspection. This was expected to require at 
least an hour, but because of a heavy rain several 
of the crews were unable to display their gear for 
inspection; the unloading procedure was, therefore, 
quite rapid. The crews then returned to the briefing 
rooms for a postflight critique with their instructors. 
Upon completion of the critiques, the psychological 
tests were repeated, and the crews were dismissed. 

The following morning at 0930 the entire group 
was assembled in a large briefing room and given a 
follow-up questionnaire. 

The psychological tests administered after the drug 
had been taken were expected to reflect the feelings 
of the subjects at that time. Large mean differences 
in the predicted direction between the test scores of 
the experimental and control groups would have sug- 
gested that the drug was effective. The same tests 
were administered before the drug was taken so that 
each subject’s pre-drug feelings might be evaluated. 
This also permitted statistical control of the initial 
differences between the experimental and control 
groups which might have masked subsequent dif- 
ferences attributable to the drug.® 


Drug Dosage and Administration 


Huffman and his associates administered pregneno- 
lone methyl ether (PME) to patients in doses of 
160 to 320 mg. from one to four times per day. 
The average daily dosage was 500 mg. To facilitate 
absorption, Huffman prepared the PME as a solu- 
tion of 25 mg./ml. in coconut oil plus a commercial 
emulsifier, This was administered in water. 

For the purpose of Air Force application, it was 
considered important to administer the drug in a 
single dose, of high potency, which would have rapid 
effects. Accordingly, in the present experiment, 800 
mg., prepared as described above, were administered. 

After swallowing the oil, the subjects were given 
a glass of orangeade and a stick of chewing gum. 
The orange juice and chewing gum effectively offset 
the oily taste, and virtually all the subjects ac- 
cepted the dose without comment. 


Psychological Tests 


Since the drug was reported to influence subjec- 
tive feelings of irritability, anxiety, and fatigue, the 


80n the assumption of equal Ns, variances, and 
covariances for the group, this design is superior to 
the use of post-drug test scores only when the cor- 
relation between pre- and post- drug test scores is 
above .50. This was the case in the present ex- 
periment. 


Effects of Pregnenolone Methyl Ether 


psychological test battery assembled for this experi- 
ment was designed to measure manifest affective re- 
actions of these kinds. The question of possible side 
effects, such as the impairment of perceptual skills, 
judgment, reasoning, memory, and intellectual and 
psychomotor skills, was deferred pending confirma- 


tion of the earlier clinical observations of Campbell ` 


et al. (1). The experiment was planned with the 
assumption that the affective effects should be evalu- 
ated first, since this could be done more rapidly and 
efficiently, and that evaluation of possible deleterious 
side effects should be studied only if positive find- 
ings were obtained on the affective tests. Affective 
relief or improvement can be accomplished by in- 
tellectual or motor impairment (as in the case of 
alcohol), but it is unlikely for intellectual or motor 
improvement to occur without some affective feed- 
back to the subjects. 

The psychological tests are described in detail else- 
where (4). They were designed to obtain a meas- 
ure of the subjects’ subjective feelings “at this time” 
and to be capable of measuring changes over brief 
periods. The battery of six tests includes a percep- 
tual test, scored for threatening objects perceived; 
an adjective check list containing 35 pairs of self- 
description adjectives, an annoyance test, a question- 
naire composed of two anxiety scales, a controlled 
word-association test, and an attitude scale. Twelve 
Separate scores were computed from these six tests, 
and the following effects of PME on them were ex- 
pected: 


355 
Expected 
effect 
Test scores of PME 
1. Total objects seen (test 1) Increase 
2. Percent of threatening objects seen 
(test 1) Decrease 
3. Depressed affect (adjective) score 
(test 2) Decrease 
4, Annoyance score (test 3) Decrease 
5. Annoyance add score (test 3) Decrease 
6. Number of annoyance add items 
(test 3) Decrease 
7. Taylor scale of manifest anxiety 
(test 4) Decrease 
8. MMPI “Lie” scale (test 4) No change 
9. SAM manifest anxiety score (fest 4) Decrease 
10. Cornell Word Form (test 5) Decrease 
11, Tendency to agree (test 6) Decrease 
12. Unwillingness to admit common 
frailties (test 6) Increase 


The follow-up questionnaire, administered the fol- 
lowing day, covered the following items: (a) how 
long the subject remained awake after leaving the 
hangar; (b) whether he found it more or less diffi- 
cult, or no difference noted, to fall asleep; (c) what 
changes in feelings were noticed at any time up to 
the meeting where the questionnaire was given; (d) 
a check list of adjectives, such as “tired,” “relaxed,” 
“depressed,” “irritable,” etc.; (e) an open-end ques- 
tion requesting additional comments, 


Table 1 


Pre-Drug and Post-Drug Mean Test Scores of Control Group and Experimental Group and the Mean 
Post-Drug Scores of the Two Groups Adjusted for Differences in Pre-Drug Means * 


Experimental 
Test 
No, Score Title Pre Post 
1 Total Objects Seen 14,02** 19.76 
1 4% Threatening Objects .23 25 
2 Depressed Affect Score 7.47** 429 
3 Annoyance Score 94.53** 89,86 
3 Annoyance Add Score 15.74** 11.57 
3 No. of Annoying Add Items 4.57° 3.88 
4 Taylor Anxiety Score 8.27** 7.10 
4 MMPI “Lie” Score 4.184 4.16 
4 SAM Manifest Anxiety Score 7.84* 5.92 
5 Cornell Word Form 4.37* 3.73 
6 Tendency to Agree 165.86 167.51 
6 Willingness to Admit 
Common Frailties 61.31° 62.37 


Control Adjusted» 
N Pre Post N Exp. Control 
51 14.47 21,92 49 19.99 21.67 
51 22° 21 49 125) 22 
49 7.65** 445 49 434° 4,40 
51 92.67** 88.78 49 89.12" 89.56 
51 19.67** 12.49 49 13.16 10.84 
51 5.82** 3.88 49 4,31 3.43 
51 8,47* 7.61 49 7.199 7.52 
51 3,924 3.98 49 4.06 4.08 
51 8.10** 6.39 49 6.00° 6.31 
51 4.31* 3.71 49 3.379 3.41 
51 165.45 167.27 49 167.51 167.42 
51 61.24¢ 62.43 49 62.37 62.46 


a Means were adjusted by method suggested by McNemar (3, p. 328), 


b None of the differences between adjusted means is si 


t at less than the .05 level. 


e Differences between the means are in the predicted direction. 


4 Predicted not to shift. 


* Di ween i redicted direction and significant at less than the .05 level. 
of Diference ern means is in predicted direction and significant at less than the .01 level. 


356 


Saul B. Sells, John R. Barry, David K. Trites, and Herman I. Chinn 


Table 2 


Comparison of Experimental and Control Groups on Follow-Up Questionnaire 


Experimental Control Signifi- 
Variable (48 cases) (43 cases) cance 

Mean number of minutes awake after 

leaving hangar 198.7 189.8 = 
More difficulty falling asleep 1 0 
No difference in falling asleep 36 29 ‘a 
Less difficulty falling asleep 11 14 
Number of favorable feeling items circled 5.92 5.05 * 
Number of favorable additional comments 3.08 3.16 > 


* T test not significant, 
#* x? test not significant. 


Results 


The results for the twelve test scores are 
summarized in Table 1. The mean raw 
scores on the Minnesota Multiphasic Person- 
ality Inventory “Lie” scale are 4.18 and 4.16, 
respectively, for the pre- and post-drug ad- 
ministrations for the experimental group and 
3.92 and 3.98, respectively, for the control 
group. These mean scores cluster closely 
around the 50th percentile of adult males and 
indicate appropriate test-taking attitudes on 
the part of the subjects. The ranges of “Lie” 
scores for both groups are within normal 
limits. 

A comparison of the pre- and post-drug 
mean test scores for the experimental group 
indicates that nine of the 11 mean scores * 
shift in the predicted direction after adminis- 
tration of the drug. Of these, six means 
shifted to an extent significant at the .01 
confidence level and one to an extent signifi- 
cant at the .05 level. It may be inferred that 
the shifts in mean scores are consistent with 
the hypothesis that the putative effects of 
PME are operative; however, the same gen- 
eral effects are found for the control group 
with 10 of the 11 mean score shifts from 
pre-drug to post-drug administration in the 
predicted direction. Of these 6 are signifi- 
cant at the .01 level of confidence and 2 at 
the .05 level. Furthermore, none of the dif- 
ferences between the experimental and con- 
trol groups, in adjusted® post-drug mean 
scores, approximates the .05 confidence level. 

t Excluding MMPI “Lie” scale, predicted not to 
shift. 


5 Mean scores were adjusted by method suggested 
by McNemar (3, p. 328). 


These results indicate that changes in feel- 
ings and affect are experienced by the crews 
between the time immediately after landing 
and one to two hours later. These changes 
have been noted frequently by crew mem- 
bers and may represent release from tension, 
relaxation, and change of task from flying to 
whatever they may be doing on the ground. 
Since no increment of improvement was found 
in the experimental group over the control 
group, it is necessary to reject the effects of 
the drug as a factor contributing to the 
changes observed. 

In Table 2 the results from the follow-up 
questionnaire are summarized. No significant 
differences were found, one day later, in re- 
ports made by experimental and control group 
crew members on the number of minutes they 
remained awake after leaving the hangar, 
diffculty in falling asleep, or general affect. 
While there may not have been sufficient time 
allowed in this study for the absorption of 
the drug, this was not indicated by the re- 
sults of the follow-up. 

Because of the variation in time between 
drug administration and post-drug testing, as 
explained earlier, the pre- and post-drug 
scores were compared for the two crews with 
the shortest elapsed time (less than 63 min- 
utes) and for the two crews with the longest 
(greater than 94 minutes). Although the 
trends observed in these data were consistent 
with those obtained for the total sample, 
there were no systematic differences which 
could be attributed to the differences in 
elapsed time. 

An additional interesting finding is that in- 
creased intestinal motility on the day of the 


— 


Effects of Pregnenolone Methyl Ether 


experiment was reported by 17 subjects on 
the follow-up questionnaire. Most of the 
subjects attributed this motility to the oily 
emulsion in which the drug was administered. 
Of this group, nine were from the experimen- 
tal group and eight were from the control 
group. 
Conclusion 


Evidence obtained from this experiment 
does not support the claims made for PME 
by Huffman and his associates. Admittedly, 
there are significant differences between the 
conditions of the present experiment and the 
clinical conditions in which Huffman’s ob- 
servations were made. The subjects of this 
investigation were healthy, robust, active fly- 
ers, observed after a fatiguing and stressful 
training mission, while Huffman’s subjects 
were seriously disturbed mental patients. 
Concentrated dosage was used in the present 
study, while the dosage in Huffman’s study 
was diffused over a longer period of time. 
Whether or not an extended period of medi- 
cation would have been more effective is a 
matter of speculation. If such a regimen is 
required, however, it would seriously limit 
usefulness of the drug among crew personnel. 
Finally, the present evaluation of the drug is 
based on rigorous objective testing, quantita- 
tively evaluated, while Huffman’s impressions 
are subjective and clinical. Although the 


357 


conditions of the present experiment did pro- 
duce severe subjective feelings of irritability, 
depression, and anxiety among the crews, and 
the changes observed through the test bat- 
tery did reflect improvement during the hours 
after landing, the subjects who received PME 
did not improve to a greater extent than did 
those given the placebo. This study does not 
support the use of PME for the alleviation of 
depression, irritability, and anxiety feelings 
of crew members. 


Received December 19, 1955. 


References 


1. Campbell, C. H., Huffman, M. N., & Sleeper, 
H. G. The effects of pregnenolone methyl 
ether in psychiatric patients. Paper presented 
before Mid-Continent Psychiatric Association, 
Kansas City, Mo., Sept. 1953. 

2, Goldfain, E., & Huffman, M. N. A study of the 
action of pregnenolone methyl ether in pa- 
tients with rheumatoid arthritis. Acta med. 


Scandinav., 1954, 147, Parts 5-6. Pp. 455- 
458. 

3. McNemar,Q. Psychological statistics. New York: 
Wiley, 1949. 


4, Sells, S. B., Barry, J. R., Trites, D. K., & Chinn, 
H. I. A test of the effects of pregnenolone 
methyl ether on subjective feelings of B-29 
crews after a twelve-hour mission. USAF 
Sch, Aviat. Med. Rep., 1955 (Rep. No. 55-11). 

5. Sleeper, H. G. Experimental use of pregnenolone 
methyl ether in treating psychiatric symptoms, 
Dis. Nerv. System, 1955, 16, 93-94. 


Journal of Applied Psycholo; 
Vol. 40, No, 1956 ss 


The Effect of Scale Interval Length and Pointer Clearance on 
Speed and Accuracy of Interpolation *” 


A. V. Churchill 


Defence Research Medical Laboratories, Toronto, Canada 


A number of studies have been reported on 
the effect of scale-interval length on inter- 
polation accuracy. Reports by Grether and 
Williams (2, 3), Kappauf and Smith (4, 5), 
and Leyzorek (6) indicate that the accuracy 
of interpolation between scale marks is de- 
pendent upon the separation of the marks. 
These studies agree that the best interval for 
interpolation lies between 0.5 and 1.0 inch, 
at normal reading distances. The displays 
studied were composed of more than one scale 
interval and thus the subjects’ task involved 
both scale reading and interpolation. 

No reports on the effect of pointer clear- 
ance (the distance between the pointer tip 
and the scale mark, in the plane of the scale) 
and interpolation have been uncovered. On 
the accuracy of reading straight scales to the 
nearest scale mark, Vernon (7) has reported 
that “up to a clearance of 0.7 inch errors do 
not increase appreciably.” For curved scales 
Woodson (8, pp. 1, 8) has recommended a 
maximum clearance of j; inch between 
pointer and scale mark. 

The experiment reported here was designed 
to reveal the effect of interval length and 
pointer clearance on the speed and accuracy 
of interpolating to tenths of a scale interval. 
The data also disclosed systematic changes in 
the direction of errors and relationships be- 
tween accuracy of initial and subsequent re- 
sponses and between orders of presentation of 
scale intervals. 

Method 
Apparatus 


Single-scale intervals were used in order to elimi- 
nate the effects of scale reading as such. The inter- 


1 Defence Research Medical Laboratories Report 
No. 164-4, Project No. D77-94-20-27, H.R. No. 125. 
2 A set of 18 data and analysis tables has been de- 
posited with the American Documentation Institute. 
Order Document No. 5041 from ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Library 
of Congress, Washington 25, D. C., remitting in ad- 
vance $1.75 for microfilm or $2.50 for photocopies. 
Make checks payable to Chief, Photoduplication 
weayice, Library of Congress. 


ty ea 


vals were 0.25, 0.50, 0.75, 1.0, 1.5, and 2.0 inches. 
Each interval consisted of a horizontal reference line 
0.03 inch thick, with scale marks 0.03 inch thick and 
0.20 inch long at the extremities. They were drawn 
in black on individual white cards. The pointer was 
0.03 inch wide at the tip, and was adjustable to give 
clearances of 0.0, 0.125, 0.25, 0.50, 1.0, and 2.0 inches 
between the pointer tip and the scale reference line. 

The S viewed the display through an aperture. A 
shutter was placed between the aperture and the dis- 
play. A chin rest served to maintain a constant 28- 
inch viewing distance, and controlled eye level so 
that approximately one inch of the pointer was 
visible for all presentations. Displays were at right 
angles to the line of sight. The scales which the 
experimenter used for setting the pointer were ex- 
panded to a seven-to-one ratio and marked off in 
tenths, 

In the first half of the experiment the shutter was 
opened by the experimenter. The S ended the ex- 
posure by pressing a microswitch. Exposure time was 
measured by an electric timer which was operated 
by the shutter. In the second half of the experi- 
ment the shutter was operated by an interval timer, 
giving a 0.3-second exposure. 

Throughout both parts of the experiment the illu- 
mination was 180 footcandles at the display, as 
measured by the Macbeth illuminometer. 


Procedure 


Ten laboratory employees served as Ss. Each S 
was presented with the six scale intervals in random 
order. The six pointer clearances were presented in 
random order for each interval. The nine inter- 
polated pointer positions were presented twice under 
each of the 36 conditions. Each series of 18 settings 
was randomized. 

The S was instructed in the task and shown a 
sample display. Readings were made under “speed 
and accuracy” instructions, and each exposure was 
preceded by a “Ready” signal, 


Results 


Since the results show identical trends 
whether considering error frequency or mag- 
nitude, the tabulations and analyses presented 
here are based on error frequencies. The 
tabulations based on error magnitude are not 
included in this report.’ 

Table 1 shows the total error frequencies 


3 Tables 13-20; see footnote 2. 


358 


Scale Interval Length and Pointer Clearance 


Table 1 


The Effect of Scale Interval Length and Pointer 
Clearance on the Frequency of 
Interpolation Errors 


(Subject-controlled exposure time) 


Pointer Scale Interval Length (inches) 
Clearance 
(inches) 0.25 0.50 0.75 1.0 1.5 2.0 Totalf 
0.0 78. 39" a o 2-22 19) 223 
0.125 81 28 36 24 20 21 210 
0.25 96 53 41 28 14 21 253 
0.50 88 64 50 27 22 23 274 
1.00 103 74 41 26 32 24 300 
2.00 109 79 60 49 38 30 365 
Totalf 552 337 271 179 148 138 


for 10 Ss under S-controlled exposure time. 
One hundred eighty readings were made un- 
der each of the 36 conditions. 

The data from Table 1 were transformed 
to satisfy the assumptions of analysis of 
variance (10), y=2 sint v% error, (9), 
and an analysis of variance was performed 
on the transformed data. The results of the 
analysis are shown in Table 2. 

From the analysis it will be seen that the 
decrease in errors as pointer clearance is re- 
duced is significant at the .01 level. The 
deviations from regression are not significant. 


Table 2 


Analysis of Variance of Error Data Presented 
in Table 1 


(Data from Table 1 transformed : 


y = 2 sin 4% error) 


Mean 


Source df Square F 
Among pointer clearances 
due to regression 1 4261 47.34 
deviations from 
regression 4 .0060 67 
Among interval lengths 
due to regression 1 3.1203 346.70 
deviations from 
regression 4 0893 437r 
Error 25  .0090 
Total 35 


S E a 
** Significant at the .01 level. 


a a 7 
S [ PART I 

G36 600 ‘ ERROR - SLL. q 
g \ P.C. 

235° 550k q TIME - SILL 4 
2 [ \ Pioli 
234 500 I PART I1 | 
8 ERROR - S1I.L. 
33, 450; 

o ).2 

a 32 4 400} 

SeA 

S 31 & 350; 

2 « 

© 30 g 300+ 

os 

a 29 = 2507 

wW 

z |. 

E28 200 

W 

Q27 150} 

Œ 

> 

= 26 100 


25 50 75 10 IS 20 
SCALE INTERVAL LENGTH (INS.) 

20 LO .50 25 125 0 

POINTER CLEARANCE (INS) 


Fic. 1. The effect of scale-interval length (S.I.L.) 
and pointer clearance (P.C.) on interpolation time 
and error. 


The decrease in errors as the scale-interval 
length is increased is significant at the .01 
level. The deviations from regression indi- 
cate that heterogeneity is still present. 

The mean times for each condition under 
S-controlled exposure time were tabulated 
and analyzed.* The results of the analysis 
of variance show the same relationships as 
presented in Table 2, significant at the .01 
level. 

The error frequencies for controlled (0.3 
sec.) exposure time were tabulated and ana- 
lyzed.* Analysis of variance shows the same 
results as the analysis of the S-controlled ex- 
posure time data, significant at the .01 level. 

For the pointer clearance time and error 
data the line fitted is y = bo + dix: where 
y=2 sin? vV% error/6 and « = (pointer 
clearance X 8). The curvilinear relationship 
between time and error and interval length 
is represented by the function y = bo + bı 


4 Tables 3-4; see footnote 2. 
5 Tables 5-6; see footnote 2. 


h 
Res arc 
NS QoL Lece 


360 


log x where y = 2 sin? yV% error/6 and x = 
(scale interval length x 4). The heterogene- 
ity indicated by the significance of the devia- 
tions from regression is not constant.: These 
relationships may be seen more clearly in the 
distributions of time and error which are pre- 
sented in Fig. 1. 

The curves presented in Fig. 1 demon- 
strate the effect of scale-interval length and 
pointer clearance on reading time and error. 
As will be seen from the graph, time and 
error decrease as the scale-interval length is 
increased, and as pointer clearance is de- 
creased. There is a general trend toward a 
change in the curves at the 2-inch interval 
length and the zero pointer clearance. 


Discussion 


During the administration of the experi- 
ment and the tabulation of the data a num- 
ber of interesting relationships were observed. 

1. It was noted that there tended to be 
more errors in the remaining responses to a 
group of settings if the initial response was 
incorrect, Errors were tabulated ° for the re- 
maining responses to groups of settings in 
which the initial response was correct or in- 
correct. The results showed that signifi- 
cantly more errors were made when the initial 
response was incorrect. This relationship was 
not characteristic of individual subjects, ini- 
tial pointer positions, scale intervals, or 
pointer clearances. 

2. The experiment was designed with the 
order of presentation of scale intervals ran- 
domized. While tabulating the data it was 
observed that the short scale intervals ap- 
peared to have an adverse effect on longer 
intervals when they preceded the longer in- 
tervals. Errors were tabulated for scale in- 
tervals for the various presentation positions,’ 
and the tabulations revealed a tendency to- 
ward more errors on the longer scale inter- 
vals when they followed the shorter intervals 
than when they preceded the shorter intervals. 

3. The tabulation of the data showed an 
apparent relationship between the interval 
length and the direction of errors—toward 


® Tables 7-10; see footnote 2. 
T Table 11 j see footnote 2. 


A. V. Churchill 


the interval extremes and toward the interval 
mid-point (i.e., extremes, 1 and 9: mid-point, 
5). The data were tabulated in terms of the 
direction of error for interval lengths and 
pointer clearances. Ratios of errors toward 
the extremes to errors toward the mid-point 
were calculated.? This ratio is large at the 
0.25-inch interval (a mean of 2.74 for the 
two parts of the experiment), diminishes as 
the interval is lengthened up to 1.0 inch (a 
mean of 1.22), then drops below one—i.e., 
the majority of errors tend toward the mid- 
point—giving a mean ratio of .39 at the 2.0- 
inch interval. 

When comparing direction of errors with 
pointer clearances it was found that the ratio 
is large at zero clearance (a mean of 3.93), 
diminishes as the clearance is increased up to 
1.0 inch (a mean of 1.08), then drops below 
one at the 2.0-inch clearance (a mean of .76). 
Carr and Garner (1) noted a similar change 
in the direction of errors when scale intervals 
ranging from 0.5 to 25 mm. were interpolated 
in one-hundredths. 


Summary 


1. Reading time and errors of interpolation 
decrease significantly as the scale-interval 
length is increased from 0.25 to 1.5 inches, 
with no improvement at the 2.0-inch interval 
length. 

2. Reading time and errors of interpola- 
tion decrease significantly as the pointer 
clearance is reduced from 2.0 to 0.125 inches, 
with no improvement at zero clearance. 

3. If the response to the first reading of a 
group is incorrect, there is a tendency to- 
ward more errors on the remaining readings 
in that group than there are when the initial 
response is correct. 

4. There is a tendency toward increased 
errors on a scale interval of a given length if 
it is preceded by a shorter scale interval. 

5. The majority of errors tend toward the 
interval extremes on the short scale intervals 
and pointer clearances, and toward the inter- 
val mid-point on the long scale intervals and 
pointer clearances. One inch appears to be 


8 Table 12; see footnote 2. 


Scale Interval Length and Pointer Clearance 


the transition point for both scale-interval 
length and pointer clearance. 


Received March 15, 1956. 


References 


1. Carr, W. J., & Garner, W. R. The maximum 
precision of reading fine scales. J. Psychol, 
1952, 34, 85-94. 

2. Grether, W. F,, & Williams, A. C., Jr. Speed 
and accuracy of dial reading as a function of 
dial diameter and spacing of scale divisions. 
USAF Air Materiel Command, Engng. Div., 
Aero Med. Lab Memo Rep, 1947, No. 
TSEAA-694-1E. 

3. Grether, W. F., & Williams, A. C., Jr. Psycho- 
logical factors in instrument reading. II. The 
accuracy of pointer position interpolation as 
a function of the distance between scale marks 
and illumination. J. appl. Psychol., 1949, 33, 
594-604. 

4. Kappauf, W. E., & Smith, W. M. Design of in- 
strument dials for maximum legibility. II. A 


361 


preliminary experiment on dial size and gradu- 
ation. Dayton, O.: USAF Air Materiel Com- 
mand, Wright-Patterson AFB, 1948. (AF 
Tech. Rep. No. 5914, Pt. 2.) 

5. Kappauf, W. E., & Smith, W. M. Design of in- 
strument dials for maximum legibility. IV. 
Dial graduation, scale range and dial size as 
factors affecting the speed and accuracy of 
scale reading. Dayton, O.: USAF Air Ma- 
teriel Command, Wright-Patterson AFB, 1950. 
(AF Tech. Rep. No. 5914, Pt. 4.) 

6. Leyzorek, M. Accuracy of visual interpolation 
between circular scale markers as a function 
of the separation between markers. J. exp. 
Psychol., 1949, 39, 270-279. 

7. Vernon, M. D. Scale and dial reading. Med. 
Res. Council, Unit appl. Psychol, (Cam- 
bridge) Rep., 1946, No. MRC/APUR 49. 

8. Woodson, W. E. Human engineering guide for 
equipment designers. San Diego, Calif.: 
U. S. Navy Electronics Laboratory, 1954. 

9. Hald, A. Statistical tables and formulas. New 
York: Wiley, 1952. 

10. Quenouille, M. H. Introductory statistics. Lon- 
don: Butterworth-Springer, 1950. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Transfer of Training Between 


Quickened and Unquickened 


Tracking Systems * 


James G. Holland and Jean B. Henson 


Naval Research Laboratory 


Tracking systems differ widely in the effect 
that movement of the control has upon the 
displayed tracking error. Some systems have 
a tight display-control relationship and pro- 
vide a positional displacement of the target 
proportional to the positional displacement of 
the stick. Many other conventional systems 
provide an acceleration of the displayed tar- 
get for a positional input at the control. 
Such systems have a loose display-control re- 
lationship and require the operator to antici- 
pate the effects of his response and to make 
a countermovement after a given response 
but before the displayed error reaches zero 
in order to avoid overshooting. Birmingham 
and Taylor (1), however, have demonstrated 
that when position and velocity information 
are added to the display in the proper propor- 
tions, the operator is no longer required to 
anticipate the results of his previous move- 
ments of the control. Instead, he has only 
to make his immediate movements propor- 
tional to the displayed error. Since the op- 
erator has instantaneous knowledge of re- 
sults, the system is said to be quickened as 
opposed to the unquickened system, in which 
knowledge of results is delayed. 

Before quickening is adopted in any prac- 
tical situation, it is desirable to determine 
(a) if operators trained on an unquickened 
system will experience habit interference when 
forced to use a quickened system, (0) if their 
initial performance on the quickened system 
will suffer seriously when compared with their 
typical performance on the unquickened sys- 
tem with which they are proficient, and (c) 
if operators experienced with a quickened sys- 


+The opinions or assertions contained herein are 
the private ones of the writers and are not to be 
construed as official or reflecting the views of the 
Navy Department or the naval services at large. 

The material discussed in this paper has previ- 
a been presented as NRL Technical Report No. 


362 


tem will be handicapped when they have to 
switch to an unquickened system. 

Much research (3) has shown a relation- 
ship between similarity of stimuli and the 
transfer of training. When a new response is 
learned to an old or similar stimulus, nega- 
tive transfer is expected due to the inter- 
ference provided by the arousal of the old, 
and now erroneous, response. This condition 
might appear to prevail between a quickened 
and an unquickened task. To the naive op- 
erator the displays might seem to be very 
much alike, since in both cases only the track- 
ing error is presented. When an operator has 
been trained on a quickened system, his re- 
sponse should come to be directly propor- 
tional to the displayed error; but if switched 
to the unquickened system, proportioning the 
responses to the displayed error would be 
detrimental. Thus, negative transfer might 
be expected when an operator trained on a 
quickened system is transferred to an un- 
quickened system, or vice versa. 


Method 


Apparatus. The tracking task was of the com- 
pensatory type. The S was required to keep a dot 
on a cathode-ray tube centered on a hairline by 
manipulating a joy stick. The dot was free to move 
only in the horizontal plane. The dot was forced 
off the hairline by a sine wave of three cycles per 
min. generated by an analog computer. Despite the 
regularity of the course, the displayed error ap- 
peared erratic to S since it was the difference be- 
tween the course and the control output. To pre- 
vent S from anticipating the initial direction of 
movement, the polarity of the course was reversed 
randomly between trials. 

The control was a spring-restrained joy stick. 
Movement of the stick deflected the plate of a 
vacuum-tube strain gauge providing a voltage pro- 
portional to the stick deflection. In the unquick- 
ened system this voltage was fed through two inte- 
grators of an analog computer and then combined 
with the course and fed into the display (Fig. 1)- 
Thus, a deflection of the stick resulted in an ac- 
celeration of the dot on the scope. A stick deflec- 


Quickened and Unquickened Tracking Systems 


ERROR 
INTEGRATOR 
[course piscar} {wan}-+ {conrro] 14 K 
+ 
Fic. 1. Simplified block diagram of tracking ap- 
paratus. Switch position Q provided the quickened 


system and switch position U provided the unquick- 
ened system. 


tion of 1 cm. resulted in an acceleration of 16 cm./ 
sec.? in the dot. The quickened system had two 
feedforward loops—one around both integrators 
added position information, and the other around 
the second integrator added velocity information. 
A stick deflection of 1 cm. resulted in a displayed 
displacement of 2 cm., a velocity of 8 cm./sec., and 
an acceleration of 16 cm./sec.*. Thus, the relation 
of position, velocity, and acceleration was the 1:4:8 
relation which previous research (4) has shown to 
be optimal. 

Scoring was accomplished by using a device which 
averaged the displayed error on each trial without 
regard to sign. 

Procedure. Twenty-four naval enlisted men, hay- 
ing combined GCT and ARI scores of 120, served 
as Ss. They were divided into four groups of six 
Ss each. Two groups were trained on the unquick- 
ened system—one of these received 140 40-sec. learn- 


TRANSFER TRIAL 
(LOW TRAINING) 


AVERAGE INTEGRATED ERROR 


363 


ing-trials, and the other received 260 40-sec. learning 
trials. The remaining two groups were trained on 
the quickened system—one received 140 trials and 
the other received 260 trials. Ten trials were given 
in each experimental session. There were approxi- 
mately 40 sec. between trials within each session and 
a minimum of 20 min. between successive sessions. 

After an S had completed his training trials he 
was switched to the other system, i.e. an S trained 
with the unquickened system was tested on the 
quickened system and one trained with the quick- 
ened system was tested on the unquickened system, 
Transfer of training was evaluated by comparing 
the performance during the initial test session with 
the first training session of the 12 Ss who originally 
were trained on the system in question. 

After all Ss had been given 80 40-sec. trials on 
the test condition, they were switched back to the 
system with which they were originally trained. 
This was done to ascertain the extent of interfer- 
ence provided by the intervening practice with a 
different system. 


Results 


The results are summarized in Fig. 2. 
Plotted on the abscissa are successive ses- 
sions—each session containing ten 40-sec. 
trials. On the ordinate are plotted average 


TRAINING CONDITION 
UNQUICKENED HIGH © 
UNQUICKENED LOW oo 

QUICKENED HIGH © 
QUICKENED LOW O--~~~-0 


TRANSFER. TRIAL 
(HIGH TRAINING). 


35 


SESSIONS 


Fic. 2. Average integrated error scores, 


tion of successive sessions. Each point represents the mean of ten 


in arbitrary units, for the four experimental conditions as a func- 


trials for each of six Ss. 


364 


integrated error scores in arbitrary units. 
Each point represents a mean of ten trials for 
six Ss. The broken-line curves represent the 
performance of the two groups trained initi- 
ally on the quickened system, while the solid- 
line curves represent the performance of the 
two groups trained initially on the unquick- 
ened system. The black circles represent 
groups with the greater amount of training 
(i.e., 26 sessions); and the white circles rep- 
resent groups with the lesser amount of train- 
ing (ie., 14 sessions). The vertical lines 
mark the points at which the Ss switched sys- 
tems (session 27 for the high-trained groups 
and session 15 for the low-trained groups). 
The Ss trained on the unquickened system at 
these points began using the quickened sys- 
tem, while those trained on the quickened 
system began using the unquickened system. 

The extent of the transfer can be seen 
by comparing the initial transfer session 
with the first session during the training pe- 
riod for the two groups beginning their train- 
ing with the system in question. When ses- 
sion 15 for the group with the low degree of 
training on the unquickened system is com- 
pared with the first session of the two groups 
trained on the quickened system, it is seen 
that positive transfer occurred. The mean 
integrated error score for the first training 
session of all Ss initially trained with the 
quickened system is 7.28, while the mean for 
the first transfer session of the group with 
the low degree of training on the unquick- 
ened system is 5.09. These points are sig- 
nificantly different at the .05 level, indicating 
that the transfer effect is different from zero. 
The transfer session is also significantly dif- 
ferent from the mean integrated error score 
of 3.94 found during session 14 for the quick- 


James G. Holland and Jean B. Henson 


Table 1 


Per Cent Transfer Scores for the Four 
Experimental Conditions 


Transfer Scores (Per Cent) 


Degree of Quickened to Unquickened 

Training Unquickened} to Quickened 
Low 58 64 
High 51 46 


of 5.70 was obtained on the first transfer ses- 
sion as compared with 7.28 for Ss lacking the 
preceding experience with the unquickened 
display. However, these two points are not 
significantly different; therefore, it cannot be 
concluded that transfer was greater than zero. 
A comparison of the two transfer sessions is 
not significant either, so there is no basis for 
the conclusion that the differing amounts of 
training provide different degrees of transfer. 

In the case of switching from the quickened 
to the unquickened display, transfer is posi- 
tive for both groups; and in both cases the 
difference is statistically significant at the .05 
level. Also, in both cases the transfer is not 
complete. The average integrated error score 
for naive Ss is 18.30 as compared with 10.87 
for those having first had 14 sessions with 
the quickened system and 11.82 for those 
having first had 26 sessions with the quick- 
ened system. The transfer points for these 
two groups did not differ significantly from 
each other, so again there is no basis for con- 
cluding that the different extents of training 
result in differing amounts of transfer. 

Table 1 presents the percentage of trans- 
fer scores. These scores are obtained by the 
following formula: 


scores for inexperiences Ss — scores for transferred Ss 


scores for inexperienced Ss— scores at asymptote of learning 


ened groups. Thus, while significant positive 
transfer was obtained, the transfer was not 
complete. 

When the group receiving the greater 
amount of training on the unquickened con- 
dition was switched to the quickened condi- 
tion, there was again an indication of positive 
transfer, An average integrated error score 


Zero per cent transfer would mean the per- 
formance is the same on the transfer day as 
that obtained by inexperienced Ss. One hun- 
dred per cent transfer would mean that the 
score on the first transfer session is the same 
as the scores obtained at the asymptote of 
learning. For both there is somewhat less 
positive transfer for the higher amount of 


Quickened and Unquickened Tracking Systems 


training but these differences are not signifi- 
cant. 

After eight sessions on the transfer condi- 
tion Ss were switched back to their original 
training condition to answer an exploratory 
question regarding interference introduced by 
the intervening experience (Fig. 2). For all 
groups the performance is essentially at the 
same level as immediately before the inter- 
vening experience with the second display. 
This suggests that there may be no difficulty 
in switching back and forth between quick- 
ened and unquickened systems, provided both 
have been learned to some degree of pro- 
ficiency. 

Discussion 


The results of this experiment would thus 
seem to answer the questions posed. An op- 
erator experienced with unquickened systems 
should not be penalized in learning to operate 
a quickened system, nor should an operator 
trained with a quickened system experience 
difficulty in learning to use an unquickened 
system. Instead, in either case, he might 
receive some benefit from his previous ex- 
perience. However, since transfer is not com- 
plete, some training would probably be neces- 
sary before the operator would reach the full 
potential of the system. In this regard, it 
may be of considerable practical importance 
that much less training should be required in 
the case of switching to the quickened task. 

Results similar to the present study were 
obtained by Lincoln (2), using rather dif- 
ferent tracking situations. He investigated 
transfer of training among three different pur- 
suit tracking systems. In one system a step 
displacement of the control provided a ve- 
locity of the displayed cursor (i.e. an un- 
quickened velocity control); in a second 
system a step displacement of the control 
provided both a position and a velocity of 
the displayed cursor (i.e., a quickened ve- 
locity control); and in the third system a 
step displacement of the control provided 
simply a position of the cursor (i.e., a posi- 
tion control having no counterpart in the 
present study). He found various degrees of 
Positive transfer in switching either from the 
unquickened control to the quickened con- 


365 


trol or from the quickened control to the un- 
quickened control. Thus, in his study, quick- 
ened and unquickened velocity controls for 
pursuit tracking provided results similar to 
those of the present study, which used quick- 
ened and unquickened acceleration controls 
in a compensatory tracking situation. 

Interestingly, Lincoln obtained negative 
transfer in cases of switching from either the 
quickened or unquickened controls to the po- 
sition control. It would be informative to 
investigate the nature of the transfer that 
would be obtained in switching from an ac- 
celeration control to a position control or 
from a control providing position, velocity, 
and acceleration to a position control. Un- 
fortunately, the present study provides no in- 
formation on this point. 

Earlier it was explained that negative trans- 
fer might be expected since the relationship 
between the displayed error and the appro- 
priate response appears to be so different for 
the two systems. However, positive rather 
than negative transfer was obtained. Since 
the previous literature has clearly established 
the relationship between transfer effect and 
both similarity of the stimuli and similarity 
of responses involved in the two tasks, the 
results of the present study raise some ques- 
tion concerning the psychological nature of 
the two tasks used here. There seem to be 
two possibilities. First, the stimuli used even 
by naive Ss may be very different for the two 
tasks. That is to say, the loose display-con- 
trol relationship in the unquickened system 
and the tight display-control relationship in 
the quickened system might be recognized 
even by naive Ss (or at least very early in 
training). If so, the dissimilarity of the 
stimuli might prevent negative transfer. Posi- 
tive transfer would then be explained as a re- 
sult of general familiarity with the tracking 
systems employed here. The second possi- 
bility is that certain principles of responding 
might be learned in one task and transferred 
to the other task. For example, S might learn 
to avoid some amplitudes of stick movement 
which should not be used for either task. 
Thus, it would not be a matter of transfer of 
the actual stimulus-response relationship but 
rather of eliminating certain components of 


366 


the response pattern which would be errone- 
ous in either tracking task. 

The answer to this problem must await re- 
search which determines the psychological 
nature of tracking behavior. Such basic work 
should permit prediction of how the opera- 
tor’s performance would vary as a function 
of many other variables. Until such informa- 
tion is available the task similarity theory of 
transfer will probably be of little value in 
predicting transfer effects between different 
continuous tracking systems. 


Summary 


This study was designed to determine the 
direction and extent of transfer of training 
for Ss switched to a quickened tracking sys- 
tem after having been trained with an un- 
quickened system and for Ss switched to an 
unquickened tracking system after having 
been trained with a quickened system. 

Four groups of six Ss each were used. Two 
groups were trained on the unquickened sys- 
tem and two groups were trained on the 
quickened system. Training consisted of 140 
40-sec. trials for one of the groups trained on 
the unquickened system and for one of the 
groups trained on the quickened system. For 
the remaining two groups training consisted 
of 260 40-sec. trials, one on the unquickened 
system and the other on the quickened sys- 
tem. After training, each group was switched 
to the system for which it was naive. Trans- 
fer of training was evaluated by comparing 


James G. Holland and Jean B. Henson 


the performance during this initial test ses- 
sion with the first training session of the two 
groups which originally were trained on the 
system in question. 

The results of the experiment suggest a 
number of conclusions. 

1. Positive transfer occurs in switching 
either from unquickened to quickened sys- 
tems or from quickened to unquickened sys- 
tems. 

2. Different amounts of training, within 
the range employed in the present study, 
provide no difference in the extent of transfer, 

3. Transfer of training between these two 
systems is not complete. Thus, some train- 
ing is necessary before the full potential of 
the new system is reached. 


Received March 15, 1956. 


References 


1. Birmingham, H. P., & Taylor, F. V. A design 
philosophy for man-machine control systems. 
Proc. Inst. Radio Engrs, 1954, 42, 1748-1758. 

2. Lincoln, R. S. Visual tracking: III. The instru- 
mental dimension of motion in relation to 
tracking accuracy. J. appl. Psychol., 1953, 
37, 489-493. 

3. Osgood, C. E. The similarity paradox in human 
learning: a resolution. Psychol. Rev., 1949, 
56, 132-143. 

4. Searle, L. V. Psychological studies of tracking 
behavior. VI. The intermittency hypothesis 
as a basis for predicting optimum aided- 


f 


tracking time constants. U. S. Naval Res. 4 


Lab. Rep., 1951, No. 3872. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Theory and Analysis of Component Errors in Aided Pursuit Tracking: 
in Relation to Target Speed and Aided-Tracking Time Constant * 


J. Richard Simon? and Karl U. Smith 


University of Wisconsin 


This study deals with an analysis of op- 
erator errors in aided pursuit tracking. The 
aim is to determine how variations in target 
speed and aided-tracking time constant affect 
the type of errors made. 

Aided tracking is partial automation of the 
steering function in tracking. Its object is to 
simplify the operator’s task and thus reduce 
error. The aid supplied is a rate of cursor 
movement which is automatically generated 
by a motor system as the operator adjusts 
his hand control to follow the target. The 
amount of aid in a system is expressed by 
the aided-tracking time constant. This con- 
stant is the ratio between the amount of 
direct displacement of the cursor and the 
change in velocity of the cursor per unit of 
control movement. 

The objectives of this study are both of a 
theoretical and applied nature. What is the 
actual psychological effect of a tracking aid? 
Is the aid an actual automation of rate-con- 
trol movements, as it is supposed to be? 
Are there other effects of the aid on the track- 
ing behavior? These questions are basic to 
many theoretical questions beyond the ap- 
plied problem of the tracking aid. Partial 
answers to them are supplied by the present 
experiment. 

Method 
Apparatus 

The aided-pursuit tracking device used in this 
study has been described elsewhere (4, 8). The op- 
erator’s task is to keep a cursor aligned with a mov- 
ing target by adjusting a handwheel control. The 
pattern of target movement is determined by un 
irregularly shaped cam driven by a 1-r.p.m. motor. 
The target moves in a radial course involving nine 
reversals of direction and continuous changes in 
Velocity. Since target velocity is continually chang- 


1 This research has been supported by funds voted 
by the Legislature of the State of Wisconsin and as- 
signed by the Graduate School Research Committee, 
the University of Wisconsin. 

2 Presently on a Fulbright research grant at the 
Psychological Laboratory, University of Cambridge, 
England. 


ing, the target-speed variable is expressed in terms 
of the r.p.m. of a variable-speed motor which drives 
the target through a ball-and-disc integrator. Pat- 
tern of target movement remains constant at the 
various target speeds so over-all speed changes are 
a result of slight increases in the extent of the ten 
back-and-forth sweeps of the target. 

The error-recording system employs a generator 
and receiver selsyn which continually compare the 
position of the cursor with that of the moving 
target. When target and cursor are properly aligned, 
the shaft of the receiver selsyn remains stationary. 
However, when target and cursor are not aligned, 
the shaft of the receiver selsyn moves off the zero 
error line in the direction and to the extent of the 
distance off target. An electrically heated writing 
point attached to the shaft of the receiver selsyn 
makes a continual tracing of tracking error on 
waxed kymograph paper. 


Experimental Design and Procedure 


The data consist of tracking records for 27 Ss. 
All Ss had been trained for four days in connection 
with another study (8). The Ss reported for a fifth 
day during which the present records were taken, 

The experimental design takes the form of a 
3X3 factorial in the cells of a replicated 9X 9 
latin square. Two variables are manipulated simul- 
taneously, They are target speed and aided-track- 
ing time constant. The three target speeds used are 
23 r.p.m., 30 r.p.m., and 37 rpm. The time con- 
stants are .25 sec., 0.5 sec., and 1.0 sec. The nine 
combinations of target speed and time constant make 
up the experimental conditions. Each $ performs on 
each experimental condition in an order determined 
by one of the nine different sequences of conditions 
occurring in the latin square. 

Error records are analyzed in a manner illus- 
trated by Fig. 1 (6). Three categories of errors are 
distinguished in terms of their duration or extent 
along the time axis, short wavelength errors, errors 
of intermediate wavelength, and long wavelength 
errors. When these error categories are defined in 
terms of duration they are, approximately, short 
wavelength errors, less than one second, intermedi- 
ate wavelength errors, between 1 per second and 1 
per 3.5 seconds, long wavelength errors 3.5 seconds 


or more. 


8 The authors gratefully acknowledge the pains- 
taking efforts of Miss Anne Mathews in reading the 
error records and the assistance of Betty Pearl Si- 
mon in collecting the data. 


367 


368 


AMPLITUDE OF ERRORS 


<——_——_ Time ———> 


Fic. 1, Enlarged drawing of part of a tracking 
error record showing the method used to categorize 


errors. Excursions from zero error are measured 
from the point at which the excursion begins to the 
point at which it ends. The wavelengths of super- 
imposed errors are measured parallel to the zero 
error line. The starting or end point of a super- 
imposed error defines the distance from the zero 
error line at which this measure is taken. The let- 
ters R, I, P, on the record mean, respectively, rate 
control, intermediate, and positioning errors. 


Results 


Figure 2 pictures the mean number of 
errors in the three categories as a function of 
aided-tracking time constant. Each value 
represents the average number of errors of a 
given wavelength made during a one-minute 
trial. Data from the three target speeds are 
combined. It can be noted that the number 
of long wavelength rate errors and short 
wavelength positioning errors both increase 
significantly as time constant increases, i.e., 
as the amount of aiding or rate control de- 
creases. However, the number of intermedi- 
ate wavelength errors shows a significant de- 
crease as the aiding decreases. 

Figure 3 pictures the mean number of 
errors in the three categories as a function 
of target speed. Data from the three time 
constants are combined, Only the number of 
short wavelength errors shows a significant 
increase with increasing target speed. There 
is a tendency for both long wavelength errors 


J. Richard Simon and Karl U. Smith 


MEAN 

pila O——O INTERMEDIATE WAVELENGTH 
O—— SHORT WAVELENGTH 

35 RTR @ LONG WAVELENGTH 


30 


25 


25SEC SSEC 10SEC 


AIDED TRACKING TIME CONSTANT 


Fic, 2. Mean number of errors in the three cate- 
gories as a function of aided-tracking time con- 
stant. 


and intermediate errors to increase with tar- 
get speed, but the over-all Fs are not sta- 
tistically significant. 

Tables 1, 2, and 3 summarize the analyses 
of variance of the frequency of errors in the 


MEAN 


OP ERRORS ©—o INTERMEDIATE WAVELENGTH 
5 A SHORT WAVELENGTH 


@-----@ LONG WAVELENGTH 


30 Oar se Wee aera 


23 RPM 37 RPM 


30RPM 


TARGET SPEED 


Fic. 3, Mean number of errors in the three cate- 
gories as a function of target speed. 


Component Errors in Aided Pursuit Tracking 369 
Table 1 Table 3 
Summary of Analysis of Variance Summary of Analysis of Variance 
Short Wavelength Errors Intermediate Wavelength Errors 
Mean Mean 
Source df Square F Source df Square F 
Target speed 2 4.9901 18.43* Target speed 2 -1768 3.02 
Time constant 2 13.2094 48.78* Time constant 2 5.3984 20.96* 
Speed X constant Speed X constant 
interaction 4 4058 1.50 interaction 4 +3752 1,46 
Trials 8 1.0295 3.80* Trials 8 -1800 1,43 
Sequences 8 5.7482 1.65 Sequences 8 1,5110 1.19 
Residual between Ss 18 3.4736 12.83* Residual between Ss 18 1.2692 4.93* 
Square uniqueness 56 3236 1.19 Square uniqueness 56 2575 1.88* 
Residual error 144 .2708 Residual error 144 1368 
249 242 


* Significant at the .01 level of confidence. 


three categories. Bartlett chi-square tests in- 
dicated heterogeneity of variance (2) so a 
x+ .5 transformation was used. 

Table 1 is a summary of the analysis of 
the short wavelength (duration) errors. Both 
target speed and aided-tracking time constant 
are significant sources of variation when 
tested against residual error. The significant 
F for trials reflects the tendency for the num- 
ber of short wavelength errors to decrease 
over the trials within a single experimental 
session. 

Tables 2 and 3 summarize the analyses of 
the long wavelength errors and the intermedi- 
ate category errors. In both analyses, the 


Table 2 
Summary of Analysis of Variance 
Long Wavelength Errors 
Mean 
Source df Square F 

Target speed 2 5101 2.90 
Time constant 2 12.5589 71.52* 
Speed X constant 

interaction 4 3799 2.16 
Trials 8 1955 1.11 
Sequences 8 1.7117 1.54 
Residual between Ss 18 1.1105 6.32* 
Square uniqueness 56 .1756 1.52* 
Residual error 144 1159 

242 


* Significant at the .01 level of confidence. 


* Significant at the .01 level of confidence, 


square-uniqueness mean square is signifi- 
cantly larger than the residual-error mean 
square and is therefore used as the error term 
to test target speed and time constant. In 
both cases, only time constant proves to be a 
significant source of variation. 


Discussion and Summary 


Records of error from 27 Ss are analyzed 
to find the relation between types of error in 
pursuit tracking and two main determinants 
of tracking accuracy, target speed, and aided- 
tracking time constant. Three categories of 
error are distinguished in the analysis. The 
short wavelength errors are thought to rep- 
resent positioning errors or quick adjustive 
movements to get back on course. The long 
wavelength errors probably represent errors 
in rate adjustment. 

The main finding of this study is that the 
psychological effects of an aiding device are 
complex. Different types of movement which 
produce error are differentially affected by 
the aid. Increasing the aiding decreases the 
frequency of short wavelength (fine position- 
ing) and long wavelength (rate control) er- 
rors. However, errors of intermediate wave- 
length are increased in frequency when aiding 
is increased (i.e. when the aided-tracking 
time constant is decreased). 

Increasing target speed generally increases 
the frequency of all types of error but this 


370 


increase is statistically significant only for 
errors of intermediate wavelength. 

The fact that the long wavelength errors 
decrease in number as aiding is increased sug- 
gests that the aid, within narrow limits, is an 
effective automation of the rate control move- 
ments. Increasing the aiding also reduces the 
number of short wavelength errors. This 
finding is in keeping with our previous (8) 
claim that a main effect of the aid is to filter 
out a certain percentage of the fine position- 
ing errors. 

Intermediate wavelength errors, which ac- 
count for most of the errors in the task, are 
increased in number with increased aiding. 
This finding both confirms and explains prior 
data (3, 5) which indicate that the aided- 
tracking device is not an aid at all but a 
hampering device. The instrumental addition 
in aided tracking produces transformations of 
movements which increase both the percep- 
tual and reactive complexity of the task be- 
yond that required in unaided tracking. Only 
in the simplest tracking tasks, involving very 
slow target speeds and uniform target courses, 
will performance be improved by the aid. 
The aid is more than an automation of rate. 
It is a filter for rapid movements and a de- 
terrent to good position control. 

The “optimum” aided-tracking time con- 
stant has been shown to vary within the gen- 
eral range of .25 to 1.0 sec. (1). Present re- 
sults indicate that the optimum constant is 
an outcome of many complex negative and 
positive effects of the aid on the component 
movements in tracking. We cannot agree 
with the interpretation of Mechler, Russel, 
and Preston (7) that an optimum time con- 
stant of .5 sec. can be interpreted as a spe- 
cific reaction-time property of discrete move- 
ments in tracking. Weaknesses in this theory 
have been pointed out previously (8). The 
present findings put an even greater burden 
on the theory for it cannot be extended to 
make provision for the compound positive 


J. Richard Simon and Karl U. Smith 


and negative effects of an aid on the different 
types of error-producing movements. 

The experimental facts presented here give 
further support to what we have called a 
resonance theory of tracking (8). Tracking 
movements are a spectrum of continuous 
oscillatory movements related to positioning 
control and rate control. The oscillatory fea- 
tures of these movements are defined funda- 
mentally by the orbits of hand, arm, and 
body motion in tracking. The dynamic cor- 
respondence between these orbits of move- 
ment and the mechanical properties of the 
tracking device, in relation to target course 
and target speed, define the nature of track- 
ing error. 


Received February 16, 1956. 


References 


1. Andreas, B. G., & Weiss, B. W. Review of re- 
search on perceptual motor performance un- 
der varied display-control relationships. Roch- 
ester, N. Y.: University of Rochester, 1954. 
(Sci. Rep. No. 2, Contract AF 30(602)-200.) 

2. Edwards, A. L. Homogeneity of variance and 
the latin square design. Psychol. Bull., 1950, 
47, 118-129. 

3. Lincoln, R. S. Visual tracking: III. The instru- 
mental dimension of motion in relation to 
tracking accuracy. J. appl. Psychol, 1953, 
37, 489—493. 

4. Lincoln, R. S., & Smith, K. U. Transfer of train- 
ing in tracking performance at different target 
speeds. J. appl. Psychol, 1951, 35, 358-362. 

5. Lincoln, R. S., & Smith, K. U. Systematic analy- 
sis of factors determining accuracy in visual 
tracking. Science, 1952, 116, 183-187. 

6. Lincoln, R. $., Simon, J. Rọ, & De Crow, T. W. 
The effects of practice upon different com- 
ponent movements in visual tracking. Per- 
cept. & Mot. Skills Res. Exch., 1952, 4, 123- 
131. 

7. Mechler, E. A., Russel, J. B., & Preston, M. G. 
The basis of the optimum aided-tracking time 
constant. J. Franklin Inst., 1949, 248, 321- 
334. 

8. Pearl, Betty E., Simon, J. R., & Smith, K. U: 
Visual tracking: IV. Interrelations of target 
speed and aided-tracking ratio in defining 
tracking accuracy. J. appl. Psychol., 1955, 
39, 209-214. 


Journal of Applied Psycholo, 
Vol. 40, No. 6, 1956 ey 


Ability Grouping in Army Basic Combat Training * 


Donald C. Findlay, Seymour M. Matyas, and Hermann Rogge III 
Human Research Unit Nr 1, CONARC, Fort Knox, Kentucky 


Whether or not to group students by intel- 
lectual level has been a persistent problem 
for educators and training supervisors since 
the introduction of the group intelligence test. 
Of the several objections raised against the 
practice of ability grouping, the most fre- 
quent criticism has been that segregating slow 
learners deprives them of the help and stimu- 
lation provided by rapid learners. 

This study investigated the benefits of het- 
erogeneous ability grouping in Army Basic 
Combat Training, an eight-week program 
stressing fundamental military skills. The 
hypothesis tested was: Low-ability men, when 
grouped with higher ability men in training, 
will reach a significantly higher level of pro- 
ficiency than low-ability trainees grouped by 
themselves. 


Method 


In this study, low-ability men were trained in 
squads with the usual heterogeneous (low-medium- 
high) spread of intelligence scores, as well as in spe- 
cial squads (low-high) from which medium ability 
men were excluded. A third type of squad, contain- 
ing low-ability men only (low) served as a control. 
To facilitate “interaction” learning within each type 
of squad, a system of competition was established 
in which rewards were given on the basis of squad 
proficiency rather than individual proficiency. 

If the hypothesis was valid the low-ability men in 
squads with high-ability men (low-high) would at- 
tain the highest level of achievement, low men in 
heterogeneous squads (low-medium-high) would be 
second, and low men in squads by themselves (low) 
would be third. 

Subjects. Two experimental companies, each con- 
taining approximately 200 men, were specially or- 
ganized to meet the intelligence requirements of the 
study. These men had been in the Army only a 
few days before their assignment to an experimental 
company. They were selected on the basis of in- 


1 The research reported here was conducted by the 
senior author while he was employed by The George 
Washington University, Human Resources Research 
Office, operating under contract with the Depart- 
ment of the Army. The junior authors were Army 
enlisted men assigned to Human Research Unit Nr 1. 
Opinions and conclusions are those of the authors 
and should not be construed as representing those of 
the Department of the Army. 


371 


telligence only, using the Aptitude Area I (AAT) 
score of the Army Classification Battery (1). The 
Aptitude Area I scale, which has a mean of 100 and 
a standard deviation of 20, is based on the average 
score of the trainee’s performance on three tests: 
Reading and Vocabulary, Arithmetic Reasoning, and 
Pattern Analysis. 

Criterion. A four-hour performance test, adminis- 
tered at the end of the training period, was the ex- 
perimental criterion. This test, for which norms 
(based on other companies) are available, is a com- 
prehensive and reliable instrument, specifically con- 
structed to measure proficiency in Basic Combat 
skills (2). 

Procedure. The experiment was conducted inde- 
pendently, but in the same manner, within each of 
the two companies. The officers, cadre, and physi- 
cal facilities of the companies were not predeter- 
mined; they were company organizations which hap- 
pened to be available for filling at the time of the 
experiment. In each company all subjects attended 
the same classes, received the same instruction, and 
in general experienced similar treatment from the 
Army. 

Three levels of intelligence or ability were defined: 
(a) low ability, AAI score of 90 or lower; (b) me- 
dium ability, AAI score of 91 through 110; and (c) 
high ability, AAI score of 111 or higher. Frequen- 
cies within each of the categories were proportional 
to the distribution of the AAI population. In each 
company, these ability levels were used to form the 
three types of squads (low-medium-high, low-high, 
low), with subjects from a given level being assigned 
randomly to appropriate squad type. 

The low-medium-high squads were formed to give 
an intelligence composition approximating that usu- 
ally found in training companies, and contained 25% 
low-ability men, 50% medium-ability men, and 25% 
high-ability men. Within each of these squads, low- 
ability men could associate with medium- and high- 
ability men, The low-high squads, constructed to 
provide maximum opportunity for association be- 
tween low-ability men and high-ability men, con- 
tained an equal number of each. The only kinds of 
association possible were (a) association of men 
whose abilities were approximately equal, and (b) 
association of high-ability men and low-ability men. 

The low squads, by containing only low-ability 
men, permitted low-ability men to associate, within 
the squad, with other low-ability men only. 

The experimental companies were organized in a 
manner intended to restrict the association and com- 
munication of the low-ability trainee to the group of 
men who composed his squad or squads like his. 
Each of the four platoons in a company was com- 


372 


posed of four squads of the same type; thus a com- 
pany contained two platoons of low-medium-high 
squads, one platoon of low-high squads, and one 
platoon of low squads, Each platoon was housed 
separately, a specific barracks area being assigned 
to each squad. 

To obtain a situation especially conducive to inter- 
action learning in squads which contained low- and 
higher-ability men, a weekly competition was held 
in the form of a proficiency test over the material 
covered in the week’s instruction, Although testing 
was individual, competition was based on the aver- 
age score of the entire squad. 

The competition took place within the platoon 
only so that squads always competed against squads 
of identical ability composition, Each platoon had 

‘a winning and a losing squad each week, with the 
winning squad receiving week-end passes, exemption 
from extraduty work details, and priority in the 
mess line. The losing squad in each platoon received 
no passes, ate last, and performed most of the extra- 
duty work details during the following week, 

To inform each squad of its rank in the platoon 
and to identify the men within the squad whose 
scores had raised or lowered the group’s score, indi- 
vidual scores as well as squad scores were posted 
each week, Thus as the training program progressed, 
higher ability men could learn which of their squad- 
mates needed assistance, and similarly, low-ability 
men could learn which of their squadmates could 
give them assistance. 

Criterion testing with the four-hour proficiency 
test was conducted in the two companies on the last 
day of the Basic Training program. 


Results 


The performance of low-ability men on the 
criterion test was analyzed by kind of group- 
ing (squad type) and by company. The 
analysis of variance is summarized in Table 
1. Neither grouping nor company difference 
was significant. The interaction was also 
nonsignificant. 

Differences between the three levels of 
ability (low, medium, and high) were found 


Table 1 


Summarized Analysis of Variance of Performance 
on Final Proficiency Test by Low-Ability 


Trainees in Three Ability Groupings 
Mean 
Sources of Variance df Square F P 
Groups 2 21,69 15 — 
Companies EEE oh 83 econ 
Groups X companies 2 20726 141 — 
Error 142 147.47 


Donald C. Findlay, Seymour M. Matyas, and Hermann Rogge III 


Table 2 


Summarized Results of Analysis of Variance of 
Performance on Final Proficiency Test 
by Three Levels of Ability 


Sources of Mean 
Variance df Square F P 
Ability levels 2 14,429.87 117.67 <.001 
Companies 1 703.56 5.74 — 
Ability levels X 
companies 2 90.93 .14 - 
Error 355 122.81 


to be significant beyond the .001 level in an 
analysis of variance summarized in Table 2. 
The company difference and the interaction 
were not significant. 

A mean score for the experimental com- 
panies on the criterion test was computed by 
averaging the mean scores of subjects within 
intervals of ten points on the AAI scale, i.e., 
the mean score of subjects with AAI scores 
between 71 and 80, between 81 and 90, etc. 
When this mean experimental company score 
was compared with similarly derived mean 
scores of norm companies (norms for pre- 
training companies as well as posttraining 
companies being available), it was found 
that experimental companies had apparently 
learned about 28% more than the average 
company. Analysis of variance revealed that 
differences between experimental and norm 
companies were significant beyond the .005 
level. 


Discussion 


The results of this study fail to support the 
view that low-ability trainees profit by asso- 
ciation in training with higher ability trainees. 
Even though special motivational and organi- 
zational conditions were introduced to facili- 
tate the hypothesized interaction learning, 
low-ability men learned no more in squads 
with high-ability men or in squads with high- 
and medium-ability men than in squads with 
other low-ability men only. 

That these results did not come about 
through indifference of subjects to the pro- 
gram of squad competition seems to be shown 
by the unusually high performance of trainees 
of all ability levels on the final proficiency 


Ability Grouping in Army Basic Combat Training 


test. Likewise, the superiority of both me- 
dium- and high-ability men on the final pro- 
ficiency test indicates that higher ability sub- 
jects actually knew enough to be of help to 
their low-ability squadmates. Although men 
of all levels apparently wanted to be in win- 
ning squads and although the higher ability 
men apparently could have helped lower abil- 
ity men in their squads, it seems that (a) the 
higher ability men failed to give help, or (b) 
the help given was not sufficient to produce 
differential proficiency between low-ability 
men in the various kinds of squads. 

Since heterogeneous grouping failed to in- 
crease achievement in a situation which was 
deliberately made conducive to interaction 
learning, it is unlikely that such benefits will 
occur in similar training situations in which 
there are no special conditions, 


Summary 


This study investigated the effectiveness of 
heterogeneous ability grouping as a method 
of increasing proficiency in Army Basic Com- 
bat Training. In each of two companies, low- 
ability trainees were trained under three con- 
ditions of ability grouping. One group of 


SS 


low-ability men trained in squads containing 
only low-ability men (low), one group in 
squads containing high- and medium-ability 
men also (low-medium-high), and one group 
in squads containing high men also (low- 
high). In spite of a system of competition 
that made privileges dependent on squad per- 
formance, a proficiency test given at the end 
of eight weeks of training failed to show a 
significant difference between the learning of 
low-ability men who had high-aptitude men 
in their squads and those who did not. 
Achievement at all ability levels was unusu- 
ally high, but low men who were trained in 
squads by themselves were just as proficient 
as low men who were trained in squads with 
higher ability men. 


Received February 13, 1956. 


References 


1. A manual for the army classification battery, De- 
partment of the Army, SR 615-25-27, Feb. 21, 
1951. 

2. Baker, R. A., et al. Manual for the administra- 
tion of the individual proficiency tests for 
basic combat and advanced light infantry 
training. Human Resources Research Office, 
George Washington Univer., 1955 (TR 19). 


Journal of Applied Psychology 
Vol. 40, No, 6, 1956 


Differentiation of Individuals in Terms of Their Predictability 


Edwin E. Ghiselli 
University of California 


When scores on a test are unrelated to cri- 
terion scores or are related to them only to a 
very low degree, the presumption is that the 
test is of little value. Hence in a prediction 
or selection situation tests with low validity 
are quickly discarded and the entire effort is 
directed to the development of tests which 
will yield scores that are substantially related 
to the criterion. 

Even though the validity coefficient of a 
test is negligible, there is the possibility that 
at least with certain individuals reasonably 
accurate predictions of criterion performance 
nevertheless may be made from scores on the 
test. As one regards the scatter diagram of 
the scores on two variables that exhibit a low 
relationship, it is apparent that some indi- 
viduals fall on or very close to the line of re- 
lations while others depart markedly from it. 
Thus for some individuals there is quite close 
correspondence between standard scores on 
the test and standard scores on the criterion. 
The remainder of the individuals display to 
varying degrees differences between standard 
test and standard criterion scores. 

Suppose, as the author has suggested else- 
where (5), that it were possible by some 
other means, perhaps another test, to differ- 
entiate those individuals whose test and cri- 
terion scores show small discrepancies from 
those individuals whose test and criterion 
scores are markedly different. Then it would 
be possible to screen out a group for which at 
least reasonably accurate predictions can be 
made. Thus even though the validity of the 
test for the entire group is low, for some in- 
dividuals who can be differentiated before- 
hand, the test would have some practical 
utility. 

In a somewhat different form this notion is 
implicit in dealing with individual cases in 
clinical and guidance work. Consider the 
case of a counselor attempting to decide 
whether a young person should seek educa- 
tion above the secondary school level. If it 


appears that motivation or interests seem in- 
appropriate, he might not recommend college 
even though the intelligence test score is high. 4 
In effect, what is being said is that when the 
individual possesses certain other character- 
istics, there will be little correspondence be- 
tween test performance and college achieve- 
ment. 

Therefore there is nothing new in the no- 
tion that it is possible to differentiate between 
those individuals for whom a test is a good 
predictor and those for whom it is a poor one. 
However, it remains to be seen whether it is 
possible to make such a differentiation in a 
systematic and objective fashion. It is the 
purpose of the present investigation to ex- 
amine this possibility. 


Methods and Procedures 


Scores from one test and two inventories were ob- 
tained on candidates for the job of taxicab driver at 
the time of hiring. The test consisted of tapping 
and dotting items, and the inventories consisted of 
24 pairs of forced-choice items which sought to get 
at appropriateness of occupational level and interest 
in jobs involving personal relationships. The details 
of these devices have been described elsewhere (1). 
Previous investigations have indicated that these de- 
vices have some, though modest, validity for vari- 
ous aspects of the job of taxicab drivers (2, 3, 4). 

In the present investigation the criterion of job 
proficiency consisted of production during the first 
12 weeks of employment. Raw production figures 
were corrected for temporal variation and differ- 
ences in division in which the driver operated. Rec- 
ords were obtained on 193 men who were randomly 
divided into two groups, 100 comprising an experi- 
mental group, and 93 a cross-validation group. 


Results 


The validity coefficients of the three pre- 
dictors together with their intercorrelations 
for the experimental group are given in 
Table 1. The validity of the tapping and 
dotting test at best can be characterized as 
limited. Neither of the two inventories has 
any appreciable value as a selective device. 
It is apparent that any combination of scores 


374 


Differentiation of Individuals 


Table 1 


Validity Coefficients of and Intercorrelations 
Among Predictor Variables for the 


Experimental Group 

Occu- Personal 

Tapping pational Relation- 
and Level ships 

Variables Dotting Inventory Inventory 
Criterion .259 .055 125 
Difference Score 318 126 
Tapping and Dotting 029 -283 


on the test and either of the inventories, as 
through multiple correlation, would have no 
greater validity than that of the test alone. 

For each individual in the experimental 
group the difference between his standard 
score on the tapping and dotting test and his 
standard criterion score was computed. Dif- 
ferences in sign were ignored; hence an indi- 
vidual with a low difference score was one 
whose standard test and criterion scores were 
very similar, and an individual with a large 
difference score was one whose standard test 
and criterion scores were very different. The 
coefficients of correlation between these dif- 
ference scores and scores on the two inven- 
tories are given in Table 1. The coefficient 
of correlation was found to be of moderate 
size for the occupational level scale and low 
for the personal relationships scale. There- 
fore there was a tendency for those individu- 
als who made a low score on the occupational 
level inventory to display a correspondence 
between standard test and criterion scores, 
and for those individuals who made a high 
score to show a discrepancy between test and 
criterion scores. There was little such tend- 
ency in the case of the personal relationship 
inventory. 

From the foregoing it would appear that if 
only those individuals who made low scores 
on the occupational level inventory were used, 
the coefficient of correlation between scores 
on the tapping and dotting test and the cri- 
terion would be greater than the value of .259 
obtained for the entire group. However, no 
such tendency should result from a similar 
selection on the basis of the personal rela- 
tionship inventory. 


375 


To examine this notion, the validity coeffi- 
cients for the tapping and dotting test were 
calculated for the cross-validation group using 
three degrees of selectivity on the basis of 
the two inventories. The validity of the test 
scores was calculated for the one-third and 
two-thirds earning the lowest scores on the 
two inventories. The first of these groups 
should be composed of the one-third of the 
individuals whose performance is quite pre- 
dictable with the least predictable two-thirds 
discarded. The second group should be com- 
posed of the individuals whose performance is 
fairly well predictable with the least pre- 
dictable one-third discarded. 

For the one-third of the individuals in the 
cross validation whose scores on the occupa- 
tional level inventory indicated their job per- 
formance should be quite predictable from 
the tapping and dotting test the validity co- 
efficient was found to be .664, whereas the 
validity coefficient for the most predictable 
two-thirds of the individuals was .323, and 
that for all cases only .220. On the other 
hand, for the one-third of the individuals 
whose scores on the personal relationships 
inventory indicated their job performance 
should be most predictable, the validity of 
the test was .000, for the most predictable 
two-thirds it was only .130, and for all cases 
it was .100. 

In a practical selection situation, such as 
the present one with taxicab drivers, a first 
elimination of applicants can be made by 
dropping out those individuals for whom pre- 
diction of job success by means of the selec- 
tion test is likely to be poor. Then a second 
elimination can be made on the basis of the 
selection test, picking those individuals whose 
scores are high. Thus in the present case 
those candidates scoring high on the occupa- 
tional level inventory could be first elimi- 
nated. This process would leave those whose 
performance is substantially related to scores 
on the tapping and dotting test. Then those 
scoring low in this test could be eliminated 
resulting in the retention of a group whose 
average criterion performance is high. If the 
personal relationship inventory were used, no 
such benefits should accrue. 

The question then is raised as to what pro- 


376 


portion of candidates should be dropped out 
by the first screening and what proportion by 
the second screening. For example, if it is 
desired to obtain from a group of individuals 
20% whose criterion performance will be sig- 
nificantly better than average, should 40% 
be dropped in the first screening and 40% in 
the second screening, or should 20% be 
dropped in the first screening and 60% in 
the second screening? No definitive answer to 
this question can be offered at the present 
time. Undoubtedly the optimal percentages 
to be eliminated in the two screenings will be 
a function of the magnitude of the correla- 
tions between the tests, the criterion, and the 
difference scores. 

On purely rational grounds it would appear 
that the optimal percentages eliminated in the 
two screenings would be nearly the same. If 
a very high proportion is eliminated in the 
first screening, while to be sure the predic- 
tion of success of the remainder will be good, 
there will be so few individuals left to elimi- 
nate in the second screening that there will 
be-very little improvement in criterion scores. 
On the other hand, if very few are eliminated 
in the initial screening, then the validity of 
the selection test for the second screening will 
be so low that even with a high proportion 
eliminated the gain will be small. 

To illustrate the problem an example using 
the cross-validation group is presented in 
Fig. 1. The objective of the selection proc- 
ess is taken as the selection of the best 20% 
of candidates. Then various distributions of 
elimination between the two screenings can 
be made of the remaining 80%. At one ex- 
treme none can be eliminated in the first 
screening and the entire 80% can be elimi- 
nated on the basis of the second screening. 
At the other extreme 80% could be elimi- 
nated on the basis of the first screening and 
none in the second screening. The mean of 
the standard criterion scores of the “best” 
20% of individuals selected by various dis- 
tributions of percentages of elimination at 
the two stages were calculated. The mean 
criterion scores of the individuals remaining 
after elimination are shown in Fig. 1. 

Reference to Fig. 1 will show that using 
the occupational level inventory as the basis 


E 
Edwin E. Ghiseli 


Occupational Level 


Inventory Fela 
A 


`, KASS 
Nee \ 


\ 


Personal Relationship Inventory WH `v 


Mean Criterion Score of Selected Cases 


Percent 
Eliminated in: © 
First Screening O 10 20 30 40 50 60 70 80 


Second Screening 80 70 60 50 40 30 20 10 © 

Fic. 1. Mean criterion scores of workers surviv- 
ing the selection process under various conditions of 
selection. 


for the first elimination, when the very large 
proportion of individuals is eliminated either 
in the first screening or in the second screen- 
ing, the final results are poorest. A more 
equitable division of elimination between first 
screening and second screening is superior. 
Best results were obtained when a somewhat 
larger proportion was eliminated in the first 
than in the second screening. The personal 
relationship inventory, which has little or no 
value in selecting predictable individuals, does 
nothing to improve the selection of high per- 
formers on the criterion. 


Discussion 


The results of this study point to the pos- 
sibility of distinguishing applicants whose job 
performance can be predicted by ordinary 
selective procedures from those whose per- 
formance is poorly predicted. Selective pro- 
cedures, therefore, can be improved not only 
by the addition of highly valid predictors to 
present procedures, but also by the addition 
of devices to screen out individuals whose 
levels of aptitude and job proficiency show 
little correspondence. 

The investigation reported here is not suffi- 
ciently extensive to furnish many clues con- 
cerning the kinds of variables that will be 
useful in this type of screening. It seems 
likely that such variables will have a consid- 1 
erable degree of specificity for each particu- 
lar selection situation. However, the results 


. of Individuals 


obtained with the occupational level inven- 
tory do suggest one interesting possibility. 
Each item in the inventory called for a 
choice to be made between two jobs in terms 
of their interest to the testee. The two jobs 
were similar in nature but one was at a 
lower and the other at a higher level, e.g., 
bookkeeping and accounting. Since the job 
of taxicab driver is only at the semi-skilled 
level, presumably it would not provide suffi- 
cient challenge for a person with higher oc- 
cupational ambitions, Therefore low scores 
on the inventory were taken as the most ap- 
propriate. 

As was seen, scores on the occupational in- 
ventory were unrelated to proficiency, yet 
they did distinguish those individuals whose 
aptitude and achievement levels were similar 
from those whose levels were different. If 
the inventory does measure occupational 
goals, then it would appear that inclusion 


377 


both of individuals whose goals are appro- 
priate and individuals whose goals are inap- 
propriate in a validation study masks the 
predictive power of the aptitude measure be- 
ing evaluated. 


Received January 27, 1956. 


References 


1. Brown, C. W., & Ghiselli, E. E.. Age of semi- 
skilled workers in relation to abilities and in- 
terests. Personnel Psychol., 1949, 2, 497-511. 

. Brown, C. W., & Ghiselli, E. E. Prediction of 
labor turnover by aptitude tests. J. appl, 
Psychol., 1953, 37, 9-12. 

3. Brown, C. W., & Ghiselli, E. E. The prediction 
of proficiency of taxicab drivers. J. appl. 
Psychol., 1953, 37, 437-439. 

. Ghiselli, E. E., & Brown, C. W. The prediction 
of accidents in taxicab drivers, J. appl. Psy- 
chol., 1949, 33, 540-546. 

5. Ghiselli, E. E. Worker selection; concepts and 

problems. Personnel Psychol., 1956, 9, 1-16. 


nN 


> 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Optimum Letter Size For a Given Display Area * 


C. S. Bridgman and E. A. Wade ° 


University of Wisconsin 


A number of studies have investigated the 
visibility of printed materials as a function 
of the design and spacing of the symbols em- 
ployed (see summary in reference 5, Part III, 
Ch. IV, Section II). A related problem can 
be stated as follows. Given a certain display 
space, limited by a high-contrast border, what 
is the maximally visible size of inscribed let- 
ters? To make the problem more concrete, 
we can think of an instrument-panel window 
within which a line of letters is placed. Es- 
thetic and artistic considerations have always 
demanded a margin between inscribed letters 
and the limits of their background. On the 
other hand, printing, particularly lower case 
letters, can still be read when a considerable 
portion of the detail is masked, although this 
has not been tested under threshold condi- 
tions. However, in view of the recognized 
adverse influence of local brightness differ- 
ences on the perception of other nearby con- 
tours (2), better visibility might be found in 
the situation described above for letters small 
enough that their critical contours were some- 
what removed from the high contrast region 
at the edges of the viewing field. Thus better 
visibility might be achieved by using smaller 
letters than the maximum size possible in a 
given space. An experiment has been carried 
out to explore this problem using single lines 
of block capital letters, and a visual acuity 
criterion of visibility. 


Method 
Apparatus 


A variable magnification projector was used to 
measure visual acuity.8 Test material consisted of 
the five lines of five letters each on the projection 
slide of this instrument. The instrument has an 
independent opaque masking slide, with apertures 
which govern the relative size of the bright back- 


1 Supported in part by a grant from the Graduate 
Research Committee, from funds provided by the 
Wisconsin Alumni Research Foundation. 

2 Now at Tufts University. 

3 Clason Acuity Meter, formerly manufactured by 
Bausch & Lomb Optical Company. 


ground. This was modified to provide three ratios 
of letter size to vertical dimension of the back- 
ground, One of these apertures was made the same 
height as the letters, so that the limit of the back- 
ground was tangent to the upper and lower edges 
of the letters (one-to-one ratio). A second aperture 
provided clearance, above and below, one-fifth the 
size of the letters (ratio of letter to field size of 
1:14), and the third provided a clearance of 2.25 
times the letter size (1:5.5). It should be under- 
stood that the aperture was projected by the vari- 
able magnification system, so that the relations stated 
above were maintained as letter size was varied to 
determine thresholds. 

Projection and observation distance were both 20 
feet. The scale on the instrument was modified to 
read directly in minutes of visual angle subtended at 
S’s eye by the projected image of the letters (ie, 
five times the unit dimension). 

Two background luminance levels were employed, 
8.45 mL. and 0.084 mL. The former was achieved 
by using an aluminized projection screen, the latter 
by projecting onto a flat black mat surface. 


Subjects 


The forty subjects used in this study were men 
and women students from elementary courses in psy- 
chology at the University of Wisconsin. Each had 
at least 20/20 vision as measured on a printed Snel- 
len test chart. Binocular viewing was used through- 
out the experiment. 


Procedure 


To minimize apparent improvements in acuity 
based on increasing familiarity with test material as 
the sessions progressed, Ss were instructed to adopt 
an “ease of reading” criterion. As size was increased, 
the S was asked to state at which size the letters 
first appeared to be just easily readable. As a check, 
he was asked to read the letters. If he missed more 
than one letter, a new determination of acuity was 
made, 

The general experimental procedure was as fol- 
lows: First, one measurement was taken with all five 
lines of letters exposed, at the higher intensity. 
Then, for one group of 20 Ss, the smallest aperture 
was introduced and five thresholds determined, one 
for each line of letters. The procedure was repeated 
for the 14 field and then for the 5.5 field. Thresh- 
olds were then determined for the lower intensity, 
with field size presented in the same order. 

For the second group the procedure was the same, 
except the order of presentation of field size was 
reversed. 


378 


Optimum Letter Size 


Finally, a recheck was made at high intensity with 
all five lines exposed, to determine the extent of 
practice effects. Apparently the threshold technique 
employed was effective in reducing such effects, be- 
cause the mean final thresholds were less than 0.2 
minutes of visual angle lower than the initial thresh- 
old for each group. 

Order of presentation of the five lines of letters 
was counterbalanced among Ss and conditions. 


Results 


Mean thresholds for each subject for each 
condition were determined, and the over-all 
means, for both groups combined, are pre- 
sented in Table 1. As expected, thresholds 
are higher (poorer acuity) when the edge of 
the background field is closer to the letters. 
An analysis of variance of the individual 
means indicated that the relative field size 
variable is statistically significant ( < .001). 
Thresholds improved nearly 11% at both lu- 
minance levels when the small surround was 
added, and 18 to 20% when the larger sur- 
round was provided. 

We were also interested however, accord- 
ing to the original question, in determining 
the total (vertical) size of the display under 
these threshold conditions in order to see if 
providing a background or surround improves 
visual acuity sufficiently to compensate for 
the extra space taken up, i.e., enough to per- 
mit readable letters in a space the same as or 
smaller than that required when the letters 
are presented without a surround. The total 
vertical dimension of each of the three dis- 
plays, at threshold, can be obtained by multi- 
plying the threshold letter size in Table 1 by 
the corresponding field size ratio. The re- 
sults are presented in Table 2. Even for the 
smallest size of field used, which provided a 
clearance above and below the letters only 
equal to the stroke width, the threshold field 


Table 1 
Threshold Letter Size, in Minutes of Visual Angle 


_ Luminance Level (mL.) 


Ratio of Field 

to Letter Size 8.45 0.084 
1.0 6.00 8.93 
14 5.36 7.97 
5.5 4.82 7.29 


379 
Table 2 
Size of Field (Vertical Dimension in Minutes of Arc) 
with Letters at Threshold 
Luminance Level (mL.) 
Ratio of Field tae ge ge oe 
to Letter Size 8.45 0.084 
1.0 6.00 8.93 
1.4 7.50 11.1 
5.5 26.5 40.0 


size is increased by about 25%, at both lu- 
minance levels, as compared to the no-sur- 
round condition. 


Discussion 


The effect of size of surround on various 
visual thresholds has been extensively investi- 
gated (1, 3, 4). In general, presentation of 
a threshold test object on a small surround 
results in poorer performance (higher thresh- 
olds) than with a larger surround. Although 
these phenomena are commonly formulated 
in terms of the area of the surround, Fry and 
Bartley have shown (2) that a critical factor 
in such findings is the proximity of a “border” 
(e.g., the transition from the illuminated sur- 
round to the dark background) to the thresh- 
old contours. Since visual acuity thresholds 
depend on the establishment of threshold 
gradients of excitation corresponding to the 
differences in intensity in the retinal image, 
it is not surprising to find that acuity thresh- 
olds are similarly depressed when the sur- 
round field is reduced and the border is closer 
to the test letters. 

Some part of the loss in acuity when the 
border was contiguous with the upper and 
lower edges of the letters might be attributed 
to modification and obliteration of some of 
the form and shape characteristics of the let- 
ters. This factor would presumably supple- 
ment the threshold depressing effects of the 
reduction in surround and proximity of the 
border. Actually, with the type of letters 
used in this experiment, there appeared to be 
little confusion introduced by this factor. 

Usual practice would dictate leaving a clear 
surround of considerable extent between let- 
tering and the edge of a drawing or between 
the letters and any border lines surrounding 


380 


the letters. Although the context of this ex- 
periment is not closely comparable to such 
situations, it would appear that part of the 
justification of this practice can be found in 
actual improvement of threshold discrimina- 
bility of letters of a given size when a sur- 
rounding field is provided. More improve- 
ment would occur, however, if the available 
space were utilized by making the letters 
nearly as large as possible, since it was found 
that leaving a clear field only as large as 
the stroke width of the letters required 25% 
more space, over all, to provide letters of 
threshold size, and this under conditions 
where the border was of high contrast, and 
therefore presumably having maximum effect 
on threshold. 

It is possible that tests made with some 
even narrower field would have resulted in 
field size at threshold equal to, or perhaps 
slightly smaller than, that obtained when the 
letters completely filled the field. However, 
when space limitations are a primary consid- 
eration, letters should be made as large as 
possible, at least to the point of very nearly 
filling the available space, in order to permit 
discrimination at a maximum distance. 

Some letters and other similar symbols 
might be especially adversely affected by hav- 
ing the edge of the field contiguous with the 
symbol. Consequently, it would scarcely be 
recommended to leave no discriminable field. 
Also it should be noted that other criteria of 
visibility, such as speed of recognition of let- 
ters or words of supraliminal size, might show 
more adverse effect of proximity of a high- 
contrast border. Although this experiment 
was carried out with a 20-foot viewing dis- 
tance, there seems to be no reason to sup- 
pose that the retinal mechanisms involved 
would not operate in a similar manner with 
shorter viewing distances, if the same cri- 
terion of visibility (acuity threshold) were 
employed, 


C. S. Bridgman and E. A. Wade 


Summary 


Visual acuity determinations were made on 
two groups of 20 subjects each at two lumi- 
nance levels (8.45 and 0.084 mL.) with three 
conditions of surround, or field clearance, 
above and below the line of letters. Provid- 
ing a field equal to the stroke width of the 
letters gave improvement in mean acuity 
thresholds of nearly 11% over those obtained 
with no field, and the wider field (2.25 times 
the letter size) gave improvements of 18 to 
20%. These proportional increases were ap- 
proximately equal at both luminance levels. 

When the data are examined in terms of 
the over-all size of field required to provide 
threshold letters, however, it is found that 
the decrease in letter size is not enough to 
compensate for the additional space taken up 
by the field. It is concluded that, when 
space limitations are a consideration, letters 
should be made as large as possible up to the 
point of very nearly filling the available space 
(margin less than the stroke width of the 
letters), in order to permit discrimination at 
a maximum distance. 


Received January 25, 1956. 


References ` 


1. Fry, G. A. Effects of uniform and non-uniform 
surrounds on foveal vision. Amer. J. Optom. 
& Arch, Amer. Acad. Optom., 1950, 27, 423- 
436. 

2. Fry, G. A. & Bartley, S. H. The effect of one 
border in the visual field upon the threshold 
of another. Amer, J. Physiol, 1935, 112, 
414-421. 

3. Ratoosh, P., & Graham, C. H. Areal effects in 
foveal brightness discrimination. J. exp. Psy- 
chol., 1951, 42, 367-375. 

4, Wald, G. Area and visual threshold. J. gen. 
Physiol., 1938, 21, 269-287. 

5. Handbook of human engineering data. 
College, 1952. 


Tufts 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Empirical Assessment of Handrail Diameters * 


Norman B. Hall, Jr. 
Dunlap and Associates, Inc. 
and Edward M. Bennett 
Tujts University 


The present study * considers one specifi- 
cation of public stairways, the handrail di- 
ameter. The American Standard Association 
(1) reports, “The size rail shall be; where of 
hard wood at least 2 inches in diameter; 
where of metal pipe at least 1.5 inches in 
diameter.” 

One major insurance company (2) speci- 
fies, “Rail should be approximately 2 inch in 
width and should be either round or so 
shaped as to permit comfortable grasping.” 

The following reports an empirical assess- 
ment of these specifications for round hard- 
wood rail. Since the large majority of hand- 
rail users are women, and since the construc- 
tion of women’s shoes and skirts adds a safety 
hazard which must partially be offset by 
handrails, only women were used in the 
study. 

Fifty-one female clerical employees, vary- 
ing in age from 20 to 60, were studied in 
counterbalanced ascending and descending se- 
ries. They were questioned four times con- 
cerning the handrail, twice going up and 
twice coming down stairs. A double forced- 
choice method of questioning was used. 

The stairway used in this study was one 
flight of stairs between the third floor and 
the landing one-half flight down, in a mod- 
ern office building. The continuous hand- 
rail was removed and replaced by four ex- 
perimental sections of 1.5, 1.75, 2.00 and 
2.25 inches diameter. The four sections were 
of equal length and placed in decreasing 
diameter for descent (increasing diameter 
ascending). 

The subjects were instructed as follows 
before ascending or descending, “Will you 


1 At the request of, and sponsored by, the Liberty 
Mutual Insurance Company, Boston. 

2 One of a series of studies in the psychology of 
safety, 


please use the handrail while going down 
(up) the stairs. When you have reached the 
bottom (top) you will be asked some ques- 
tions with regard to the handrail.” 

Question 1, preference, asked, “Which sec- 
tion was most pleasing to use? Which was 
least pleasing? From the two remaining, 
which was the most pleasing? Which was 
the least pleasing?” 

Question 2, felt safety, asked, “Which sec- 
tion do you feel would have given you the 
most security if you had started to fall? 
Which the least security?” 

As a result we had four choices (weighted 
4, 3, 2, and 1) for two questions (preference 
and felt safety), for four rail diameters (1.50, 
1.75, 2.00, and 2.25 inches). 

Two analyses were considered. First, the 
distribution of diameters scored by the 51 
subjects as first choice (highest preferred and 
highest felt safety), for ascent and descent. 
These results are shown in Fig. 1. 

Second, the distribution of mean choice 
(preference and felt safety) scores for the 
various diameters, ascending and descending 
combined, Results are shown in Fig. 2. 


(Ore eee) NN 
O--~-O----@ DOWN 


PREFERENCE FELT SAFETY 


150 175 200 225 1.50 175 
Size 

Fic. 1. Percentage of cases choosing various hand- 
rail diameters as first choice. 


200 2.25 


381 


Fic. 2. Mean choice intensity as a function of hand- 
rail diameter. 


Based upon these findings the following 
conclusions were drawn: 

1. Diameters of 1.75 and 2,00 inches are 
about equally preferred in descent. The di- 
ameter of 2.00 inches is most preferred for 
ascent. 

2. The diameter of 2.00 inches gives the 
greatest feeling of safety both in ascent and 
descent. 

3. In terms of over-all preference a di- 
ameter of 1.90 is suggested as ideal. The 
suggested latitude of deviation from this ideal 
is in the up direction, but not to exceed 2.00 
inches. 

4. In terms of felt safety a diameter of 
2.00 is suggested. 

The distributions of first choices for each 


Norman B. Hall, Jr. and Edward M. Bennett 


of the rail diameters were significantly dif- 
ferent from those to be expected on the basis 
of chance. When Question 1 was used, the 
distribution resulted in a chi square of 102, 
significant at beyond the .01 level. When 
Question 2 was used, the chi square was 122, 
also significant at beyond the .01 level. 

There is also a small but real difference 
in the distributions obtained depending upon 
which of the two questions is used. When a 
comparison is made between questions, the 
chi square is 9.36, which is slightly beyond 
the .05 level of confidence. 


Summary 


An experimental study of public stairway 
handrail diameters suggests that a diameter 
between 1.75 and 2.00 is to be preferred. A 
diameter of 2.00 feels the most safe. Di- 
ameters above 2.00 and below 1.75 are to be 
avoided. 


Received November 25, 1955. 


References 


1. American Standard Safety Code for floor and 
wall openings, railings and toe boards. A 
12-1932. 

2. Liberty Mutual Insurance Company (Loss Pre- 
vention Dept.) Specification for Public Stair- 
way Safety and Production Data Sheet No. 
87. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Personal History Data as a Predictor of Success in Service 
Station Management * 


Robert S. Soar 
Vanderbilt University 


Selecting the dealer to be given charge of a 
service station represents a serious problem 
to the oil company, yet one which remains 
little explored by scientific selection pro- 
cedures. The new dealer, after a period of 
training, is given responsibility for manage- 
ment of a physical plant and inventory in 
which: the oil company has a large financial 
and good-will investment. If he fails, it re- 
quires several months for this to become defi- 
nite, and a twofold loss has occurred mean- 
while. 

In past studies of sales personnel, the two 
procedures that have been most consistently 
useful for predicting success in a variety of 
situations have been the personal history 
blank, objectively scored, and the Strong 
Vocational Interest Blank. Since the social 
and educational level of these dealers is such 
as to make crystallized patterns of interest 
less frequent among them than among the 
largely professional group on which the Strong 
blank was standardized, developing a scoring 
scheme for personal history items was chosen 
as the initial approach to the problem. 


Procedure 


The subjects were 29 dealers currently op- 
erating service stations in a metropolitan area 
of about 300,000 population, in a Southeast- 
ern state. The criterion data consisted of 
ratings by the dealers’ supervisors. Although 
an objective criterion, such as gallonage or 
dollar sales, would be preferable on many 
counts, it was felt to be inappropriate here 
because of the strong contaminating influ- 
ence of differences in location. On the other 
hand, ratings were probably sounder in this 
case than they frequently are. Five men su- 
pervised dealers in the area, with territories 


1 This study was made possible by Mr. Gilbert B. 
Dickey, Jr., President of Trans American Oil Com- 
pany (formerly American Oil Company of Tennes- 
see). 


systematically rotated so that all raters knew 
the work of each dealer, but personal attach- 
ments were less likely to have been built up 
strongly than would otherwise be the case. 
Ratings were also made by one man at the 
next higher level of management, so that all 
dealers were rated six times. In addition, 
since all the raters shared the problem of 
selecting and training new dealers, all were 
concerned with development of selection pro- 
cedures and motivated to rate carefully. 
The aspects of service station management 
on which the dealers were rated were those 
developed by discussion of the management 
staff as the elements they felt to be impor- 
tant. Altogether, 15 aspects of performance 
were rated for each man. It seemed impor- 
tant to rate various aspects of performance 
separately, since if some elements were inde- 
pendent of others, a criterion composite would 
be less meaningful than the independent meas- 
ures. However, in the light of the small 
number of subjects and the overlapping be- 
tween items it seemed impractical to analyze 
all the intercorrelations of the 15 criterion 
ratings, so these were grouped into clusters 
that seemed a priori to be related, and the 
intercorrelations between the clusters were 
calculated. The clusters were these: Busi- 
ness Sense, involving management and financ- 
ing; Promotion, involving merchandising and 
enthusiasm in selling; Emotional Maturity, 
involving stability and emotional control; Re- 
sponsibility, involving loyalty and willingness 
to assume self direction; and Personality, in- 
volving appearance and ability to inspire 
liking and confidence. The intercorrelations 
of these ratings are shown in Table 1. 
These intercorrelations were high enough 
to make analysis of the data for each aspect 
separately seem not worthwhile; accordingly, 
the ratings on various sub-aspects of perform- 
ance were totaled and this single rating used 
as the measure of success in service station 


383 


384 


Table 1 


Intercorrelations* Between Aspects of Rated Success 
in Service Station Management 


2 3 4 5 


1, Business Sense 67 56 69.61 
2. Promotion 73. = 93 9, 95 
3. Emotional Maturity 719 64 
4. Responsibility 74 
5. Personality 


* SE = +,19; P <.001 for all intercorrelations. 


management. The average interrater reli- 
ability, by way of Kendall’s coefficient of 
concordance (1), was + .80, significant be- 
yond the .001 level. Since the performance 
of individual dealers is likely to be a subject 
for discussion at sales meetings, this figure is 
undoubtedly inflated by what Thorndike (5) 
calls “local reputation,” but on the other 
hand these discussions are directed at capi- 
talizing on the strong points and dealing with 
the weak points of each dealer, so that they 
may also have contributed to the validity of 
the ratings. 

Since the company did not keep application 
blanks on file for employees, it was necessary 
for the personal history data to be collected 
from the dealers in terms of their status at 
the time they accepted dealerships. This in- 
troduces the possibility of memory error and 
distortion, although Keating, Paterson, and 
Stone (4) have indicated that neither mem- 
ory error nor distortion is as prevalent as 
might be expected, and in fact scarcely ex- 
isted in their sample. It should be noted 
that their subjects were job applicants so 
that some pressure to inflate past responsi- 
bilities and wages might be expected, whereas 
such pressure should be much less prevalent 
here, if it existed at all. 

Since no statistic was likely to give signifi- 
cant results on such small numbers unless the 
differences were extreme, the item analysis 
was carried out on a different basis than is 
usually employed, but one whose rationale is 
an extension of (or perhaps more accurately, 
extrapolation from) current item-selection 
theory. Katzell (3) has pointed out that 
requiring stringent significance levels in item 
selection is likely to result in the selection of 


Robert S. Soar 


items in which chance positive variance is 
unusually large so that cross validation re- 
sults in extensive validity shrinkage. With 
large numbers of cases, the problem is less 
serious, but with smaller numbers (usually 
300 to 500 are mentioned) the problem be- 
comes serious. Katzell’s approach to solving 
the problem is to require less extreme signifi- 


cance levels as the number of cases becomes . 


smaller, and to use a variant of double cross 
validation in which the sample is split in 
half, and only those items are retained which 
reach a relatively liberal level of significance 
in both halves of the sample. 

The procedure used here was an extrapola- 
tion from this latter procedure. The personal 
history blanks were put in rank order by 
total performance rating, and assigned alter- 
nately to two groups. Each group was then 
split into a high and low half, and the item 
analysis carried through separately on each 
group. Items were then retained or thrown 
out on the basis of whether they discrimi- 
nated in the same direction between high and 
low halves of each group, with no considera- 
tion given to whether the discrimination took 
place at a significant level. 


Results 


Of the 39 items of personal history col- 
lected, 14 were retained on the basis of this 
analysis to make up the scoring key. They 
were weighted either 2 or 1, depending on 
the degree of differentiation shown in the data 
for the total group. The items were as fol- 
lows—weighted 2: over 5'64” in height, no 
more than 200 Ibs. in weight, between 25 
and 39 years of age, held a blue-collar job 
while in high school (jobs involving long 
hours, outside work, and/or unpleasant work- 
ing conditions) including farming, no more 
than one child; weighted 1: two or more sub- 
jects listed as liked in school, two or more sub- 
jects listed as liked least in school, held office 
in high school organization, held job in high 
school involving some aspects of white- and 
blue-collar work (meat cutter, baker), wife 
not working, own home or paying on it, owe 
money, carry other insurance in addition to 
life insurance, have $500 or more in savings, 
work on own car. 


Personal History Data as Predictor of Success 


Further data were then collected on an 
additional 23 subjects from two other, some- 
what smaller metropolitan areas in the same 
state. In addition to the 14 items retained 
in the scoring key on the basis of the first 
analysis, eight others were again tried out 
which had discriminated in one group, but 
had shown equal frequencies of responses 
from both high and low halves in the other 
group. 

In the cross-validation sample the ratings 
which were obtained were categorizations of 
the subjects into three degrees of success, 
The triserial correlation (2) of personal his- 
tory scores (based on 14 items) with rated 
success was + .47 (p < .05). The additional 
questionable items which were then reana- 
lyzed were found not to be discriminating in 
the new sample. 


Summary and Conclusions 


Ratings on 15 aspects of service station 
management were collected for 29 dealers in 
one metropolitan area. Intercorrelations of 
these ratings showed a single over-all rating 
to be appropriate. This over-all rating was 
then used as the criterion against which a 
personal history blank was item analyzed by 
a variant of double cross validation. The 


385 


items retained in the scoring key were then 
cross validated on a new sample of 23 dealers 
drawn from two other cities. 

These conclusions were drawn: 


1. A unitary criterion was adequate to de- 
scribe performance in service station man- 
agement. 

2. Of the 39 items studied, 14 were found 
to discriminate more successful dealers from 
less successful, and to retain validity with 
cross validation. 

3. An item analysis procedure based on 
double cross validation was found to be suc- 
cessful with a sample much smaller than usu- 
ally considered adequate for item analysis. 


Received March 5, 1956. 


References 


1. Edwards, A. L. Statistical methods for the be- 
havioral sciences, New York: Rinehart, 1954. 

2. Jaspen, N. Serial correlation. Psychometrika, 
1946, 11, 23-34. 

3. Katzell, R. A. Cross-validation of item analysis. 
Educ. psychol. Measmt, 1951, 11, 16-22. 

4. Keating, Elizabeth A., Paterson, D. G., & Stone, 
C. H. Validity of work histories obtained by 
interview. J. appl. Psychol., 1950, 34, 6-11. 

5. Thorndike, R. L. Personnel selection. New 
York: Wiley, 1949. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Interest Scores in Identifying the Potential Trade School Dropout 


Cecil O. Samuelson and David T. Pearson, Sr. 


The records of the Salt Lake Area Voca- 
tional School reveal that a considerable num- 
ber of its students drop out of school before 
completing a program of study. This is not 
a unique situation among post high school in- 
stitutions, but it does pose the serious ques- 
tion as to why these dropouts occur. This 
modest inquiry was not intended to furnish 
an answer to this total question but was an 
effort to explore the facet of this problem 
that relates to differential interest patterns, 
if any, among trade school students. 

While it is recognized that the factors that 
relate to dropouts are many and varied, it 
was thought that an interest inventory might 
reflect these factors. Specifically, the intent 
was to determine whether the group that 
stayed in school to complete a course could 
be distinguished from the group that dropped 
out before completion on the basis of pat- 
terns of interest as measured by the Kuder 
Preference Record, Form CH. 

Inasmuch as all the training departments 
included in this study were basically mechani- 
cal in nature, it was hypothesized that stu- 
dents successful in the sense of completing 
the course would generally score high in the 
mechanical section of the inventory, while 
the interest patterns of those who dropped 
out of school before completion would show 
a different sort of profile. Since the Kuder 
Preference Record was available on the stu- 
dents in this school, it was decided to investi- 
gate the problem through the use of this in- 
strument. 


The Sample 


The sample on which this study was made 
consisted of two groups of students—55 who 
dropped out before completion and 48 who 
completed courses; these were the total num- 
ber in both categories on whom Kuder Pref- 
erence Record Scores were available for the 
period covered by the study. All of those in 
both groups were from the auto mechanics, 
auto body and fender, diesel mechanics, car- 
pentry, electricity, electronics, machine shop, 


and welding departments of the school. The 
first group of 55 consisted of those who had 
dropped out of the trade training programs 
at the Salt Lake Area Vocational School dur- 
ing the academic years 1953-54 and 1954-55. 
Those who dropped out to take employment 
or enter apprenticeship programs in the field 
for which they were training were not in- 
cluded in this group. 

The second group of 48 consisted of those 
who either completed their programs during 
this period or who dropped out to enter em- 
ployment or apprenticeship training in the 
same field for which their vocational school 
programs were preparing them; 34 completed 
their training while 14 terminated formal 
school training to enter employment training 
situations in the same field. 

Both groups were composed entirely of 
males, and, although no effort was made to 
pair the groups, they are roughly similar as 
concerns age, marital status, and amount of 
education. 


Limitations 


Of the various limitations to a study of this 
sort, attention will be drawn to only the more 
obvious. First, there is the limitation of the 


instrument itself; it is assumed that this point » 


needs no elaboration here. A second limita- 
tion concerns the sample. The school does 
not require tests as a part of the admissions 


procedure but does give a test battery peri- || 


odically which all students must take before 
they are considered to be fully registered. In 


this particular institution students may enter — 


at any time, so the procedure is to test in one 
group all those who have entered or applied 
for admission since the last administration of 
the test battery. This means that in some 


instances students may have been in school | 


some time before they were tested; most stu- 


dents, of course, would be tested before or | 


as soon as they entered school. The actual 


situation in this study is that 44% of the , 


total group took the tests on or before the 
day they entered school. Nine per cent took 


386 


-A 


Interest Scores 


the tests within the next ten days after en- 
tering school; 8% took the tests within the 
second ten-day period; while 36% took the 
tests within the third ten-day period after 
registration. Four students took the tests 
after they had been in school more than one 
month, 

One facet of this point is that it is not 
known what effect these variable amounts of 
training may have had on these interest 
scores. A second facet in this connection is 
that some students enter school and drop out 
before they have been tested; this group is 
very small and was not included in the study, 
but this does represent a bias in sampling 
which should be noted. 

Also, it must be recognized that both of 
these groups may already be somewhat: se- 
lected in favor of an interest in mechanics. 
Presumably, few people would enroll in a 
mechanical course without at least some feel- 
ing that they would like to do that type of 
work. Furthermore, some of these students 
have had work experience before coming to 
school and consequently have some first-hand 
experience to support their decisions to enter 
vocational school in the first place. It would 
be expected that those whose felt inclinations 
were not mechanical would not have enrolled 
in these departments in the first place. 


Discussion 


The V scores on all of the individual inven- 
tories included in this study were within the 
limits prescribed in the published instruc- 
tions. Presumably, then, these students un- 
derstood the directions, had sufficient intelli- 
gence to comprehend the inventory items, 
and actually marked the answer sheets in an 
acceptable manner. 

To facilitate consideration of the inventory 
scores, the differences between the means of 
the two groups were computed in each of the 
inventory categories. Also, composite pro- 
files were made for each of the two groups 
separately which were then plotted on the 
same profile sheet. From this vantage point 
the following observations were made: 

First, it was apparent that there were no 
great differences between the two groups in 
any of the test categories. While there was 


Fic. 1. Composite profiles of those who com- 
pleted trade courses and those who dropped out be- 
fore completion. 


some variability, of course, none of the dif- 
ferences between the means, with the single 
exception of category 8, Social Service, was 
significant; and this single exception was sig- 
nificant at only the .05 level. 

A second observation was the relatively flat 
profiles shown by both groups, as indicated 
in Fig. 1. This would tend to suggest the 
idea that the interests of trade school stu- 
dents as reflected by this inventory are broad 
and diverse rather than being highly crystal- 
lized in the mechanical area. Also, the dif- 
ferences, slight as they were, did not suggest 
distinctive patterns. While we cannot be cer- 
tain that the 75th percentile, suggested by 
Kuder as representing a point of significant 
interest intensity, has meaning in this par- 
ticular connection, even by that standard the 
profiles lack points of distinctive strength. 
It will be noted that the single exception 
from this point of reference was the me- 
chanical score for the group that completed 
courses, and even that score was just barely 
in the so-called significant area—the 77th 
percentile. It might have been expected that 
the mechanical category would be the point 
of greatest interest for the completions, but 
it would also have been expected that the 


388 


strength shown in this area would be quite 
decisive rather than the fairly mild sugges- 
tion indicated. 

Still using Kuder’s 75th percentile as a 
point of reference, it may further be noted 
that while the combining of scores is useful 
in considering total samples such as these, 
the process does obliterate the observable 
variability in individual scores. For example, 
using the mechanical category on the inven- 
tory with the group that completed courses, 
seven of the 48 had scores that would place 
them below the 50th percentile, while only 
27 of the 48 were above the 75th percentile. 
The extreme caution with which these scores 
must be used becomes apparent when it is 
remembered that all of these 48 students 
successfully completed courses in these areas 
of mechanics; yet 44% of them had me- 
chanical interest scores below the recom- 
mended 75th percentile, while about 15% of 
them had scores below the 50th percentile. 

Attention should also be drawn to yet an- 
other pertinent observation. It has been 
previously noted that the 48 in the one group 


Cecil O. Samuelson and David T. Pearson, Sr. 


consisted of 34 who had completed their 
courses in school and 14 who had dropped 
out of school before graduation to enter em- 
ployment training in situations for which 
their school work had been preparing them. 
It was thought that these 14 might possibly 
be a separate group in terms of interests and 
the data were so analyzed. However, no sig- 
nificant differences were revealed. 


Conclusions 


Consideration of the data presented here 
would seem to support these conclusions: 

1. The present use of the Kuder Preference 
Record in these trade training departments of 
this vocational school is of very limited value 
in helping students evaluate their decisions to 
become mechanics. 

2. On the basis of this study and within 
its limitations, the Kuder is not helpful in 
distinguishing between those who will com- 
plete mechanical courses and those who will 
not. 


Received December 27, 1955. 


i| 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


The Naval Knowledge Test 


Albert S. Glickman 
U. S. Naval Personnel Research Field Activity, Washington 1 


The Naval Knowledge Test (NKT) arose 
from a need to measure motivation to excel 
at the Navy’s Officer Candidate School. It 
assumes that, other factors being constant, 
one who has accumulated more naval knowl- 
edge prior to enrollment at OCS, is more in- 
terested in naval matters, and will conse- 
quently find it easier to meet the work 
demands of OCS, will be less distracted by 
petty annoyances, and be willing and able to 
devote more energy to the serious pursuit of 
achievement at OCS. 


Description 


The first experimental form of the NKT 
consisted of 136 items arranged under the 
following headings: 

1. Identification or definition of names, 
words, phrases, slang, and symbols (53 
items) ; 

2. Information about ships that have been 
prominent in naval history (19 items) ; 

3. Events and locations that have been 
prominent in naval history (20 items); 

4. Prominent naval personalities, past and 
present (17 items) ; 

. 5. Knowledge of naval organization and 
practices (27 items). 

These sections were not considered as sub- 
tests but only as organizational clarifiers. 

Administration time was 45 minutes— 
enough for practically all subjects to com- 
plete the test. 


Construction 


In writing items for the test the author ad- 
hered to requirements that: 

1. The subject matter of the items should 
bear upon matters pertaining to the Navy, 


1This research was conducted while the author 
was with the American Institute for Research, work- 
ing under contract Nonr 890(01) with the Office of 
Naval Research. This article draws from contrac- 
tor’s reports on NKT construction and validation 
Prepared as Bureau of Naval Personnel Ti echnical 
Bulletins (2, 3). The viewpoints expressed herein 
are not to be construed as those of the U. S. Navy. 


past and present, and to the seas and ships 
in general. 

2. The information or knowledge required 
to answer the questions correctly should not 
be of a highly technical nature, but should 
be of the sort fairly readily available in the 
“public domain,” so that acquisition of such 
knowledge and information might be consid- 
ered to represent a manifestation of interest 
and motivation rather than special oppor- 
tunity or training of a naval or seafaring sort. 

These steps were followed in the writing of 
the items: First, the author developed ideas 
for, and wrote items on, a “free association” 
basis. That is to say, he drew from his own 
store of “naval knowledge” for material for 
items. Second, additional material for items 
was sought in various naval histories, in 
standard naval and maritime texts and hand- 
books, and in the Bureau of Naval Personnel 
magazine, All Hands. Third, reference was 
made to the file of Naval Knowledge items 
used in subject matter tests by the Orienta- 
tion Section of the Officer Candidate School 
as another source of ideas. 

After all the items had been put in the 
form of five alternative, multiple-choice ques- 
tions, they were submitted to several naval 
officers for comment or correction, to insure 
that they were in conformity with fact and 
naval usage. 

In order to promote clarity and facilitate 
instructions to the subjects the test items 
were sorted into the five classifications indi- 
cated above. 

Validation 


Population. The test was administered to 
three companies of Class 13 at the Officer 
Candidate School in the first week at the 
school, during which period the students are 
processed, tested, and organized for adminis- 
trative purposes. They do not attend any 
classes during this week, and so are only in- 
cidentally exposed to “naval knowledge” be- 
fore taking the test. The numbers of cases 


389 


390 


Table 1 


Correlations of Naval Knowledge Test Scores with 
OCS First Quarter Academic Sum 


Class 13 N r 
Company E 165 524 
Company F 161 432 
Company G 133 403 


were as follows: Company E—165, Company 
F—161, and Company G—133. 

Criterion. The criterion employed in this 
investigation consisted of the sum of quar- 
terly grades (covering the first month of 
training) for the six major areas of academic 
training at OCS. This was considered to be 
a practical criterion in that about one-half of 
the cases of disenrollment at OCS take place 
at the end of the first quarter, action being 
based in largest part upon academic perform- 
ance during this period. 

The validity of the test. Correlations were 
obtained between NKT scores and the cri- 
terion. These were computed separately for 
each company. These correlations are found 
in Table 1. 

The results, to this point, indicated that 
the coefficient of validity for the NKT, in 
terms of predicting early academic achieve- 
ment at OCS was of sufficient magnitude to 
warrant further study. 

Multiple prediction of criterion. In prac- 
tical terms, the utility of the NKT as a se- 
lection instrument is dependent upon the de- 
gree to which it taps something other than 
that which is assessed by the Oficer Qualifi- 


Albert S. Glickman 


cation Test (OQT), which is the principal 
test employed in the screening of OCS ap- 
plicants from civilian life by the Office of 
Naval Officer Procurement (ONOP).° 

In order to establish whether the NKT 
could materially improve selection (as dem- 
onstrated by prediction of performance at 
OCS), multiple correlations were computed 
for all cases in our sample for whom OQT 
scores were available.® 

Using First Quarter Academic Sum as the 
criterion, intercorrelations were obtained as 
indicated in Table 2.* 

It appeared from the differences between 
the zero-order correlations of OQT with Aca- 
demic Sum and the multiple-correlation re- 
sulting from the addition of the NKT score 
to the battery (Table 2, Column E), that the 
increase in predictive efficiency was substan- 
tial. 

These findings indicated that the NKT 
might provide an instrument for improving 


2Tt developed that Class 13 was drawn almost en- 
tirely from nonfleet sources. Less than 1% of the 
membership of this class had been drawn from the 
Navy’s enlisted personnel. (Fleet personnel often 
constitute as much as one-third of a class.) Thus 
the results are interpretable as applicable to a group 
having no previous experience in the Navy. 

3For the student populations reported here, a 
minimum Navy Standard Score (NSS) of 40 had 
been one of the prerequisites for acceptance as an 
Officer Candidate. (OQT mean NSS = 50, standard 
deviation = 10; based on a population of U. S. col- 
lege graduate applicants to ONOP [1].) 


; 


4 Attrition in the N from the original sample con- G 


sists of cases of personnel with previous Navy €x- 
perience (who do not take the OQT) and others 
who, for various reasons, did not have the OQT 
scores in their records. 


Table 2 


Intercorrelations Between Criterion (First-Quarter Academic Sum), and Predictors (Naval Knowledge Test 
and Officer Qualification Test), and Multiple-Correlation Coefficients * 


Col. A Col. B Col. C Col. D Col, E 
NKT NKT OQT Difference 
vs. vs. vs. Multiple Between 
Class 13 N Academic OQT Academic R Cols. Cand D 
Company E 140 527 408 -627 .694 .057 
Company F 146 377 -284 414 495 081 
Company G 117 328 .239 -499 543 044 
Average rs** .414 .312 519 584 065 


* See footnote 4. i 
sek Obtained by Fisher's r to 3 transformation. _ 


The Naval Knowledge Test 


391 


Table 3 


Intercorrelations Between Criterion (First-Quarter Academic Sum), and Predictors (Naval Knowledge Test 
and Officer Qualification Test), and Multiple-Correlation Coefficients 


Col. A Col. B Col. C Col. D Col. E 
NKT NKT OQT TERE Difference- 
vs. ys. vs. ultiple Bet 
Class 15 N Academic OQT Academic R” Cols. C ai D 
Company A 133 -349 .197 -548 -601 2053 
Company B 134 -528 .352 .546 .653 107 
Company C 141 -391 .293 -566 613 .057 
Class 15 
average rs 426 .282 .553 .623 .070 
Class 13 
average rs 414 312 519 584 .065 
Average r differences, 
Class 15-Class 13 012 —.030 034 039 .005 


efficiency of screening OCS applicants. 
Hence, steps were taken to refine the test 
and to investigate the validity of the revised 
form. 


Construction of a Short Form of the NKT 


As a first step toward increasing the effi- 
ciency of the NKT as a potential part of an 
OC selection battery, an item analysis was 
performed. The aim was to select for a 
shortened test those items having highest cor- 
relation with the criterion (Academic Sum), 
and which demonstrated measurement of fac- 
tors other than those already measured by 
the OQT by having no greater correlation 
with the OQT than with the Academic Sum. 
To achieve this, each item was separately 
correlated with OQT score and with Aca- 
demic Sum. Item difficulty estimates were 
also computed. On the average, items were 
correctly answered by 59.6% of the sample.° 

Items were chosen for a short form which: 
(a) had correlations between item and Aca- 
demic Sum of .10 or greater, and (b) corre- 
lated as highly with Academic Sum as with 
OQT, or more highly with Academic Sum 
than with OQT. These requirements were 

5 Parallel results could be anticipated using Second 
Quarter (mid-course) grades, inasmuch as the range 
of intercorrelations of First and Second Quarter 
grades for students still enrolled at the latter time 
over seven companies of Class 13 had been found 
to be .95 to .97 (not corrected for attrition in num- 
ber of cases and restriction in range due to disen- 


rollments from the fourth to the eighth week). 
6 Detailed item-analysis results are reported in (2). 


met by just over half (69) of the items. On 
the average, 59.9% of the sample gave the 
right answer to these items. Each of the five 
different classifications of items of the origi- 
nal form contributed about the same propor- 
tion of items to the 69-item key. 


Generalized Validity of the 69-Item Key 
for the NKT 


Since it appeared that, at least in the case 
of civilian applicants, use of the NKT as a 
supplement to the OQT held considerable 
promise for improving the selection of officer 
candidates for the Navy, the original form 
of the NKT was used for follow-up studies 
at the Officer Candidate School on a new 
class (Class 15), and scored using the 69- 
item key. 

Population. As in the original validation 
the test was administered in the first week of 
school. Data analysis was restricted to the 
population of students (with OQTs) procured 
from civilian sources. Three companies of 
Class 15 supplied the following numbers of 
cases: Company A—133, Company B—134, 
and Company C—141. 

Criterion. The criterion used was the same 
as for the earlier analysis, First Quarter Aca- 
demic Sum. 

Validity of the 69-item key. When the key 
of 69 items derived by analysis of Class 13 
responses was applied to the three companies 
of Class 15 the correlations listed in Column 
A of Table 3 were found. Reference to the 


392 


average rs” found in the two classes shows 
that the 69-item key used for the scoring of 
Class 15 does as good a job of prediction as 
the full 136-item test did for Class 13. 

Multiple prediction of criterion. Once 
again multiple correlations were run, using 
the OQT and the NKT to predict academic 
grades. The pattern of results from Class 15 
data corroborates the Class 13 findings, as 
can be seen by further inspection of Table 3. 
It may be noted (Column C) that the OQT 
validity coefficients are a bit higher for the 
Class 15 sample (.034 on the average) and 
that the multiple R is higher (.039) by much 
the same amount (Column D), while the cor- 
relation between NKT and OQT (Column 
B) is slightly less than before (— .030). 
The corresponding values (in Column E) 
show that the 69 items of the NKT continue 
to add an increment of .070 to the validity 
coefficient obtained by use of the OQT alone. 
Therefore, the percentage of the criterion 
variance accounted for increases from about 
31% to 39%, or 8%. This compares with 
the .065 validity increment contributed by 
the whole test in the Class 13 analysis. 


Discussion 


On the basis of experience with the OCS 
samples reported upon here, it appears that 
a form of the NKT of about 70 items, re- 
quiring about 25 minutes administration time 
can contribute a substantial increment of 
predictive efficiency to the screening of ci- 
vilian applicants for the Navy’s Officer Can- 
didate School, beyond that obtainable through 
the use of the OQT as a single selection in- 
strument.® 

Summary 


The Naval Knowledge Test rests on the 
assumption that the civilian who accumulates 
more knowledge about the Navy and mari- 
time matters in general before applying for 
the Navy’s Officer Candidate School is the 
person who will be more strongly motivated 
toward achieving academic excellence at the 
School. 

7 Obtained by z transformation. 

8 Experimental administration of the NKT to OCS 
applicants is taking place at ONOP branches. Fur- 
ther analysis is being conducted by the Navy to see 
whether validity holds up under operational condi- 
tions of administration and use. 


Albert S. Glickman 


The original form of the test contained 136 
items dealing with: identification or defini- 
tions of names, words, phrases, slang, and 
symbols; information about ships that have 
been prominent in naval history; events and 
locations that have been prominent in naval 
history; prominent naval personalities, past 
and present; and knowledge of naval organi- 
zation and practices. 

The test was administered to samples of 
new Officer candidates, before they began 
academic work at OCS, who had no previous 
active duty in the Navy. When the NKT 
was included in a battery with the Officer 
Qualification Test, currently the principal 
screening instrument, prediction of academic 
grades obtained during the first month of 
school was appreciably improved over that 
obtained with the OQT alone. 

By item analysis, 69 of the original items 
were chosen which showed best ability to 
predict academic success while holding over- 
lap with the OQT to a minimum. These 
were drawn in about equal proportions from 
the original five types of items. 

In order to check on the generalization of 
validity, the NKT was administered to an- 
other sample of “naive” officer candidates in 
a new incoming class and scored with the 
69-item key. The shorter form of the NKT 
predicted academic achievement as well as 
the original form had done and added the 
same increment of predictive efficiency when 
used in combination with the OQT. 

There are in progress further studies of the 
validity and practicality of the NKT when 
used in an operational setting, directly in- 
volving OCS applicants. 


Received February 16, 1956. 


References 


1. Bureau of Naval Personnel. The Navy Officer 
Qualification Test, Forms 4, 5 and 6: I. De- 
velopment and standardization. Washington, 
D. C.: Bureau of Naval Personnel, Research 
Division, 1952 (NavPers 18318). 

2. Glickman, A. S. The Naval Knowledge Test: 
construction and validation. U. S. Bur. Nav. 
Personn., Tech, Bull, 1954, No. 54-7. 

3. Glickman, A. S., & Vallance, T. R. Development 
and validation of an experimental battery to 
select officer candidates for the Navy. U. S. 
Bur. Nav. Personn., Tech. Bull, 1954, No. 
54-12. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


Development of a Structured Disguised Personality Test * 


Bernard M. Bass 


Louisiana State University 


The ease with which applicants for busi- 
ness and industrial positions can fake tradi- 
tional personality inventories has stimulated 
the attempt to develop distortion-free tests. 
Since 1945, interest has focused on forced- 
choice procedures as a solution to the prob- 
lem, yet, in an unpublished study, we have 
found that sales applicants can readily fake 
certain types of forced-choice inventories such 
as the Gordon Personal Profile. Travers (8) 
has discussed the possible ubiquitous fakabil- 
ity of forced choice. 

Structured disguised personality tests rep- 
resent another approach to solving the prob- 
lem. Such tests combine the measuring 
properties of projective techniques with the 
objective scoring of inventories. Various 
types have been proposed such as error- 
choice, sentence completion, word or para- 
graph interpretation, mutilated figures, and 
word association. Campbell (3) has sur- 
veyed the application of these techniques to 
assessing attitudes. 

The present study aimed to develop and 
evaluate a multiscale proverbs * test to assess 
selected personality variables deemed signifi- 
cant for occupational success. In 1935, Mur- 
ray’s Explorations in Personality (6) briefly 
presented lists of proverbs whose acceptance 
or rejection had been used to assess various 
personality needs. Little evidence was in- 
cluded concerning validity or reliability of 
this process. Recently, Baumgarten (1) re- 
ported using proverb selection as a means of 
assessing worker attitudes. Psychiatrists have 
long used proverb interpretation to assess 
intellectual functioning. 


1 This study was aided by a grant from the Louisi- 
ana State University Graduate Council on Research. 
The author was assisted in the analyses by Ki Suk 
Kim, Charles H. Coates, and George Palmer. He 
wishes to thank Donald T. Campbell, Cecil Gibb, 
Gerald McCullough, and Arnold Gebel for their help 
in data collection. 

2The term “proverb” will be used loosely to in- 
clude a variety of statements such as maxims, adages, 
apothegms, aphorisms, and sayings. 


The Lists of Proverbs 


With Murray’s classification of needs as a 
guide, 13 a priori lists of 20 proverbs each 
were constructed from selected sources in- 
cluding the lists in Explorations in Person- 
ality, Bartlett’s Familiar Quotations (2), Rich- 
mond’s Modern Quotations (7), and private 
lists of Louisiana Negro proverbs. Forty ad- 
ditional proverbs were in the first inventory 
but were not scored. ‘The lists were inter- 
mixed in the inventory of 300 proverbs pre- 
sented to the examinees. The directions were 
substantially as follows: 

“This is a test of your attitudes toward 
various famous sayings. Read each one care- 
fully to find its true meaning to you. Indi- 
cate whether you agree, disagree, or are un- 
certain about the statement. If you cannot 
make up your mind, it will help if you ask 
yourself if you believe the statement is usu- 
ally true or usually false.” 

On each scale, two points were assigned re- 
sponses of “yes,” one point was assigned “?” 
responses, and no points were assigned for 
“no” responses. Simple totals were based on 
sums of points assigned for a given scale. 

One example proverb from each of the 13 
scales is as follows: 

1. Material Comfort. (All the money in the world 
is useless if you can’t spend it just as you like.) 

2. Sex. (It is not very difficult to fall in love.) 

3, Harm Avoidance. (The doors of death are ever 


open.) 

4, Achievement. (To obtain success by your own 
efforts is the greatest joy in life.) 

5. Affiliation. (There is no satisfaction without a 
companion to share it.) 

6. Deference. (In matters of conduct it is best to 
conform to custom.) 

7. Autonomy. (It is best to stand alone when in 
trouble.) 

8. Aggression. (To forgive an enemy is a sign of 
weakness.) 

9, Abasement. (Outside show never makes up for 
inner worth.) 

10. Rejection. (Never trust a flatterer.) 

11. Nurturance. (Giving is always better than re- 


ceiving.) 


393 


394 Bernard M. Bass 
Table 1 
Tetrachoric Intercorrelations Among 13 Famous STE Scales and Personal Characteristics of Subjects 
(N = 400) 

A Qik a a ASNT Ag OO TATS Age M-F Educ N-S 
1. Material Comfort 60 58 60 20 43 42 44 47 50 46 37 48 —15 — 02 —21 —05 
2. Sex 41 48 17 30 40 48 32 50 24 11 48 —27 09 —36 18 
3. Harm Avoidance 68 48 50 31 39 41 37 48 35 57 —03 -12 —09 13 
4, Achievement 40 46 40 32 38 48 49 38 45 —10 06 —28 —02 
5. Affiliation 49 01 05 52 06 60 51 23 —07 —13 —05 —09 
6. Deference 15 23 55 21 61 51 42 —15 —01 —13 -02 
7. Autonomy 59 32 55 25 13 48 —10 —21 —25 27 
8. Aggression 31 56 25 19 51 —07 —06 —07 16 
9. Abasement 34 67 53 51 —19 —18 —26 07 
10. Rejection 18 13 43 -10 -02 —23 11 
11. Nurturance 69 44 —21 —21 —08 08 
12. Superego Strength 42 —08 —20 05 —il 
13. Irritability —21 —14 —20 09 
Age 15 62 —13 
Male vs. Female —08 —22 
Education —AL 


North vs. South 


Note.—Decimal points omitted. 


12. Superego Strength. (No degree of temptation 
justifies any degree of sin.) 

13, Irritability. (Only a statue’s feelings are not 
easily hurt.) 


Factor Analysis 
Subjects 


The 300-item form was administered to ap- 
proximately 2,000 cases in a variety of sam- 
ples drawn from different segments of the na- 
tional population. These included Southern 
department store saleswomen and peniten- 
tiary inmates, high school seniors in New 
Hampshire, student nurses in the same state, 
adult residents of Los Angeles, supervisors of 
a Louisiana petroleum refinery, Midwestern 
and Southern college students, Louisiana pub- 
lic school teachers, rural Southern high school 
sophomores and seniors, Marine Corps en- 
listed men, Chicago cosmetic salesmen, etc. 

From these 2,000 cases, a sample of 400 
was constructed which represented as well as 
possible, under the circumstances, the Ameri- 
can population which would be most likely 
to be assessed routinely by the final test, 
should such a test prove useful.* The 400 

3It was assumed that a personality inventory 


would most commonly be used for screening ap- 
plicants for professional, managerial, and technical 


subjects had a mean age of 26.5, with a stand- 
ard deviation of 10 years. The mean educa- 
tion was 14.5, with a standard deviation of 
1.7 years. Forty-six per cent of the subjects 
were from the North, Midwest, and West 
Coast while 54% were from the South. Sixty 
per cent were male and 40% were female. 


Results 


Table 1 shows the tetrachoric intercorrela- 
tions among the 13 scales and the demo- 
graphic data (age, sex, education, and geo- 
graphic region). A multiple centroid factor 
analysis was performed on this matrix of in- 
tercorrelations. Table 2 shows the final ro- 
tated factor matrix following seven rotations 
of the five factor axes.* 


occupations. We attempted a crude miniature re- 
production of such a sample. Entry into such occu- 
pations usually occurs after some college training and 
during the ages 20 to 35. More are male than fe- 
male. Approximately 60% live in the North. Lack 
of opportunity forced us to include in the sample 
more Southerners and more females than originally 
desired. 

4 The unrotated factor matrix and the final trans- 
formation matrix have been deposited with the 
American Documentation Institute. Order Document 
No. 5042 from the ADI Auxiliary Publications Proj- 
ect, Photoduplication Service, Library of Congress, 
Washington 25, D. C., remitting in advance $1.25 


Structured Disguised Personality Test 395 
Table 2 
Final (V;) Rotated Factor Matrix 
Factor 
te I. i rf Ill IV Vv 
3 ‘onvention: Age- (Fear of (Samplin; 

Variable Mores) (Hostility) Education) Failure) Tabee) ha 
1. Material Comfort 39 Ad —.10 42 — 21 62 
2. Sex 16 57 —.39 -30 —.29 -68 
3. Harm Avoidance 44 28 01 62 —.09 -66 
4. Achievement 31 18 —.04 84 .03 84 
5. Affiliation 70 —15 -00 24 —.05 .57 
6. Deference -66 Al —.09 :26 —.14 54 
7. Autonomy 06 65 —.12 -30 37 67 
8. Aggression 16 16 00 18 12 65 
9. Abasement 68 22 —.23 -18 19 63 
10. Rejection 10 64 —,12 34 —.01 55 
11, Nurturance 85 07 —.14 14 21 81 
12. Superego Strength 17 06 .03 .08 di .62 
13. Irritability 45 49 —.19 26 04 55 
Age —.13 .00 a —.04 —.06 53 
Sex —.10 06 —.04 — 21 — 44 25 
Education 06 —.06 81 =.25 —.04 ne 
North vs. South —A15 .09 —.d1 .22 41 34 

Factors sampling in which Southern males and North- 


It was fairly easy to find meaning in the 
final solution and to label the factors accord- 
ingly. The factors and variables most highly 
loaded on each were as follows: 


Factor I: Conventional Mores 
11. Need for Nurturance 
12, Superego Strength 17 


5. Need for Affiliation 70 
9. Need for Abasement 68 
6. Need for Deference 66 
Factor IL: Hostility 
8. Need for Aggression 16 
7. Need for Autonomy 65 
10. Need for Rejection 64 
Factor III: Age-Education 
Education 81 
Age wal 
Factor IV: Fear of Failure 
4. Need for Achievement 84 
3. Need for Harm Avoidance 62 


The last factor, V, concerned sex and ge- 
ography and was a consequence of accidental 


for microfilm or $1.25 for photocopies. Make checks 
payable to Chief, Photoduplication Service, Library 
of Congress. 


ern females were slightly overrepresented.‘ 
However, the three test factors, I, II, and IV, 
were independent of both demographic fac- 
tors, III and V. 

The three test factors markedly resemble 
clusters of items independently isolated by 
Cook and Medley (4) in a study of the Min- 
nesota Multiphasic Personality Inventory—a 
traditional, undisguised test. They found that 
teachers, dichotomized according to their self- 
reported ability to get along with pupils, also 
varied on the MMPI in their hostility toward 
others, in their adherence excessively to rigid 
standards of morality and in their pride in a 
thorough knowledge of subject matter. The 
“hostility” and “pharisaic-virtue” scales de- 
veloped by Cook and Medley appear to in- 
volve similar content to our Factor II and 
Factor I, while “pride in knowledge” has some 

5 Initially, there was a tetrachoric correlation of 
—.22 between “maleness” and “northernness” due 
to the fact that more of the males in our study were 
Southerners and more of the females were Northern- 
ers, The factor was labeled “Sampling Imbalance.” 
The original test scales were uncorrelated with this 
factor to any large degree. Inspection of Table 1 


suggests that sex and geographic region were inde- 
pendent of all other variables except each other. 


396 Bernard M. Bass 


resemblance to the fear of failure factor we 
isolated. 

The high loading of “harm avoidance” on 
Factor III conforms to McClelland’s (5) find- 
ing that some subjects are motivated to 
achieve by hope of success, others by fear of 
failure. 


Item Analysis 


The next step was to develop by item 
analysis a scale to measure each factor. 

A new sample of 200 subjects was drawn 
from the original pool of 2,000 cases. The 
mean age of the new sample was 21 years 
with an SD of 3.5 years. It had a mean edu- 
cation of 14.0 years with an SD of 1.5 years. 
Half the sample was from the South, the 
other half from other regions of the country 
while 55% were male and 45% female. 

Pooled scores from original scales—11. 
Nurturance and 12. Superego Strength— 
served as a crude measure of conventional 
mores. Scale 8 was used as a first approxi- 
mation of hostility and Scale 4 was used as a 
first approximation of a fear of failure scale.° 

For each of the three crude scores thus de- 
rived, the sample of 200 was trichotomized 
into an upper scoring 25%, a middle scoring 
50%, and a lower scoring 25%. For the item 
analyses, the tendency to, respond “Yes” of 
the upper and lower 25% of the distributions 
was compared since about half of all re- 
sponses fell into this category while about 
half were “?” or “No,” 

An inspection of percentage differences per- 
mitted the discrimination and selection of 
items for each scale which correlated posi- 
tively with that scale and relatively were in- 
dependent of performance on the other two 
scales. 

For example, consider the selected item 
“Meekness is better than vengeance,” which 
was “accepted” by the criterion groups as 
follows: 


®¥For a discussion of crude versus accurate meth- 
ods of estimating factor scores, the reader is re- 
ferred to R, B. Cattell, Factor analysis. New York: 
Harper, 1952, p. 80. In the present situation, the 
selected scales correlated so highly with the factors 
they were to represent, it was believed that little 
would be gained in employing multiple-regression 
procedures to optimally weight the scales to yield 
a maximum correlation with each factor. 


Criterion 
Conven- 
tional Fear of 
Mores Hostility Failure 
Criterion Upper 25% 16% 44% 60% 
Group Lower 25% 36 56 44 
% Difference +40 —12 +16 


This item was included in the final scale of 
conventional mores because it discriminated 
the “highs” from the “lows” on that scale but 
not the others. Most items selected were ac- 
cepted by 40% or more of the upper com- 
pared to the lower criterion group on the 
scale for which the item was selected, and by 
less than 20% more of the upper than the 
lower groups on the other scales.” 

Thirty items were selected from the 300 for 
the C or Conventional Mores scale, 30 for the 
H or Hostility scale, and 20 for the F or Fear 
of Failure scale. 


Reliability and Interrelations 


A new sample of 100 subjects previously 
unused was drawn from the original pool of 
2,000. This sample averaged 28.2 years in 
age with an SD of 12.8 years. Similar to 
preceding samples, its mean amount of edu- 
cation was 14.0 years with an SD of 2.2 years. 
Forty-six per cent were Southerners, and 49% 
were male; 54% were from the North or 
West, and 51% were female. 

For the sample, the intercorrelations found 
among the scales were: ron = .45, ror = 54, 
rra = 48. Corrected split-half reliabilities 
were obtained as follows: Conventional Mores, 
.83; Hostility, .72; and Fear of Failure, .69. 

An analysis of the reliability of the scales 
and the intercorrelations among them was 
performed for an available more homogeneous 
sample of 147 Louisiana Penitentiary in- 
mates. The inmates averaged 29.3 years in 
age with an SD of 7.9 years. Mean educa- 
tion was 10.5 years with an SD of 1.3 years. 
The sample was all male and almost totally 
Southern in origin. 


7A maximum value of .1 is possible for the stand- 
ard error of the difference between proportions in 
two samples of 50 cases each. Therefore, a differ- 
ence of .4 would be at least four times the standard 
error of the difference, likely to occur on a chance 
basis much less than 1% of the time. 


Structured Disguised Personality Test 


For this penitentiary sample the intercor- 
relations found were: rox = — .21, rop = -10, 
‘rx = — .30. Corrected split-half reliabilities 
were: Conventional Mores, .72; Hostility, 
-58; and Fear of Failure, .45.8 

A revised inventory of 90 items was pre- 
pared which contained 30 items for each of 
the three scales. Ten of the Fear of Failure 
items were newly written.’ For a sample of 
Louisiana State college sophomores, the in- 
tercorrelations among these scales were as 
follows: rox = —.12, rop = 32, rru = 42. 
The corrected split-half reliabilities were: 
Conventional Mores, .73; Hostility, .69; and 
Fear of Failure, .75. 


Summary 


To develop a disguised but objective per- 
sonality inventory, a factor analysis was per- 
formed on scores based on 400 examinees’ 
tendencies to accept or reject 13 lists of 
proverbs constructed to cover 13 areas. The 
three test factors which emerged following 
seven rotations were: Conventional Mores, 
Hostility, and Fear of Failure. Using 200 
new examinees, scales were constructed by 
item analysis to measure each. In subse- 

8 Assuming that inmates of a state penitentiary 
are more homogeneous than a naturally scattered 
sample of men and women, these results illustrate 
that the reliability and independence of factors are 
a function of the homogeneity of samples used. 
The more homogeneous the sample, the lower is 
factorial reliability and the higher is factorial inde- 
pendence. Reliability and orthogonality are relative 
to sample homogeneity. i, 

9 The ten items were added in an attempt to in- 
crease the reliability of the Fear of Failure scale. 


397 


quent samples, the three scales were found to 
have corrected split-half reliabilities ranging 
from .45 to .83 and intercorrelations ranging 
from — .12 to .54. The reliabilities and in- 
tercorrelations among the scales were higher 
when the groups were more heterogeneous in 
background. 

The reliabilities and intercorrelations among 
the scales suggest that three separate behav- 
ioral tendencies are being assessed. Subse- 
quent reports will deal with the relations 
between the scales and intelligence, other per- 
sonality test scores, peer ratings and occu- 
pational success as salesman or as industrial 
supervisor. 

A revised 90-item form has been prepared. 


Received December 19, 1955. 


References 
1, Baumgarten, F. A proverb test for attitude 


measurement, Personnel Psychol, 1952, 5, 
249-261. 

2. Bartlett, J. Familiar quotations. Boston: Little, 
Brown, 1948. 


3. Campbell, D. The indirect assessment of social 
attitudes, Psychol, Bull., 1950, 47, 15-38. 

4. Cook, W. W., & Medley, D. M. Proposed hos- 
tility and pharisaic-virtue scales for the 
MMPI. J. appl. Psychol, 1954, 38, 414—418. 

5. McClelland, D. (Ed.) Studies in motivation, 
New York: Appleton-Century, 1955. 

6. Murray, H. A. Explorations in personality. New 
York: Oxford Univer. Press, 1938. 

7. Richmond, A. Modern quotations. New York: 
Dover, 1947. 

8. Travers, R. M. W. A critical review of the va- 
lidity and rationale of the forced-choice tech- 
nique. Psychol. Bull, 1951, 48, 62-70. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


A Scale Measuring Attitudes Toward Working for the Government * 


Barbara P. Aalto 
Counseling Center, University of California, Berkeley 


There is a widespread feeling on the part 
of some segments of the population that 
working for the government is looked upon 
with disfavor. Public personnel officials fre- 
quently point to these attitudes as contribut- 
ing factors in high turnover, inability to at- 
tract top-level talent to government jobs, and 
low morale among government workers them- 
selves, Since these attitudes may be factors 
in an individual’s choice of career or place of 
employment, they may be of interest to high 
school and college counselors. 

A review of the literature reveals a lack of 
reliable and valid instruments for measuring 
government employment attitudes. The study 
being described in this paper was designed 
for the purpose of constructing such a scale. 
A measure of attitudes toward government 
employment could be used in the following 
ways: 

1. To identify high school and college stu- 
dents who might find satisfaction in a gov- 
ernment career. 

2. To aid in the counseling of students in 
their choice of a career. 

3. To identify and study morale problems 
within the government service. 

4. To evaluate the general level of atti- 
tudes toward government service in the gen- 
eral population, in specific groups, and in spe- 
cific geographical areas. 

5. To study changes in these attitudes with 
education, work experience, age, and changes 
in political administration. 

Two restrictions were placed on the prob- 

1 This study was completed when the author was 
on the staff of the Student Counseling Bureau, Uni- 
versity of Minnesota. It is a condensation of a dis- 
sertation submitted to the faculty of the University 
of Minnesota in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy. The 
author wishes to express thanks to her advisor, Pro- 
fessor D. G, Paterson, for his interest and invaluable 
help in the project. The study could not have been 
completed without the generous help of a group of 
personnel directors in both government agencies and 


private business and industry who helped in the col- 
lection of the data. 


lem to keep it within manageable proportions 
and avoid confusion of results. The attitudes 
expressed were toward jobs in the federal gov- 
ernment and in the career service, not elected 
officials or members of the Armed Forces. 
The study dealt with attitudes toward work- 
ing for the government mainly at the profes- 
sional and managerial levels. White (5, 6), 
in studies of the prestige of government em- 
ployment in 1929 and 1932, found differ- 
ences in attitudes toward federal, state, and 
local government employment, and differences 
in attitudes toward jobs at various occupa- 
tional levels. 


Construction of the Scale 


Construction of the attitude scale followed 
a modified Likert-type procedure. Opinion 
statements were compiled from a variety of 
sources including previous attitude surveys, 
occupational information material, and the 
stated opinions of University of Minnesota 
graduate students. Opinions expressed in 
public administration journals and a govern- 
ment employees’ newspaper, The Government 
Standard, were also adapted for use. These 
statements were edited by the writer, three 
faculty members with experience in attitude 
scale construction, and one faculty member 
from the political science department. Half 
the statements were worded in a direction 
favorable to government employment and 
half in a negative direction. The readabil- 
ity of the items ascertained by applying the 
Flesch (1) “Reading Ease” formula was 
similar to that found in a digest type of 
magazine, well within the comprehension of 
high school seniors. 

A preliminary form (Form A) of 109 items 
was administered during class time to 173 
students in introductory laboratory psychol- 
ogy at the University of Minnesota. Re- 
sponses to items were made on a 5-point scale 
from “strongly agree” to “strongly disagree.” 
Each item was scored by assigning a weight 


398 


Attitudes Toward Working for the Government 


of 5 to “strongly agree,” 4 to “agree,” 3 
to “undecided,” 2 to “disagree,” and 1 to 
“strongly disagree” if the item was worded 
in a positive direction (favorable to govern- 
ment employment). The weights were re- 
versed for negatively stated items. The total 
score was the sum of the weights for each 
item. Results of this administration showed 
that the items elicited a wide diversity of 
attitudes toward government employment and 
were not ambiguous to the respondents. 
Sixteen items of low discriminating capacity 
were eliminated, using the Rundquist-Sletto 
(3) item-scale difference method of item 
analysis. A group of seven items were added 
to provide more positively stated items that 
would be discriminating since in the item 
analysis a larger proportion of negatively 
stated items were high in discrimination value. 

It was felt that a better scale would result 
if an item analysis were done on the basis of 
an outside criterion as well as on the basis of 
internal consistency. ‘This procedure is in 
keeping with commonly accepted methods of 
test construction, but has less generally been 
used with the traditional methods of attitude 
scale construction of the Likert type. Gov- 
ernment workers who are satisfied with their 
jobs and assign a low rank to private em- 
ployment have been selected as representa- 
tive of individuals with attitudes most favor- 
able to working for the government. In 
contrast, it is assumed that a group of em- 
ployees in private business who are satisfied 
with their work and give a low rank to gov- 
ernment employment would represent atti- 
tudes least favorable to working for the gov- 
ernment. 

The 100-item questionnaire, Preliminary 
Form B, was subsequently administered to 493 
federal government employees and 299 pri- 
vate employees. They were mainly employed 
in occupations defined in the Dictionary 
of Occupational Titles (4) as professional 
and managerial. Included with the question- 
naire were the Hoppock Job Satisfaction 
Blank (2) and a personal data sheet. The 
questionnaires were distributed by agency 
and firm personnel officers, completed anony- 
mously by employees, and returned to the 
writer in sealed envelopes. Each agency or 
firm personnel officer endeavored to get as 


399 


large a sample of employees as possible. 
Collection of data extended over a period 
from June, 1953, to July, 1954. This was a 
period in which the same administration (Re- 
publican) was in office, but not long after a 
change of administration. The government 
group was from Minnesota and Washington, 
D. C.; the private group was from large firms 
mainly in Minnesota, Ohio, Connecticut, and 
New Jersey. The two groups were compa- 
rable in age, sex, work experience, educational 
level, and occupational distribution. 

An item analysis was performed using both 
an internal and an external criterion, The 
internal criterion used was the top and bot- 
tom 27% in total score irrespective of place 
of employment (based on a total sample of 
733 from both government and private em- 
ployment at the time the analysis was done). 
The external criterion was a composite of job 
status and job satisfaction. The government 
criterion group (W = 249) consisted of pres- 
ently employed government workers who met 
a stated definition of being “satisfied with 
government employment.” The satisfaction 
definition was twofold: A score above the 
53rd percentile on the Hoppock Job Satisfac- 
tion Blank (2), and a rating of first or sec- 
ond choice given to present employment on a 
ranking question on place of employment (be 
own employer, educational institution, fed- 
eral government, local government, private 
business, state government). The private 
criterion group (NV = 163) was composed of 
employees of private business and industry, 
likewise satisfied with their jobs. The gov- 
ernment and private criterion groups were 
similar in age, educational, and occupational 
distribution. Each criterion group was split 
to provide a criterion group and a cross- 
validation group. Items were eliminated that 
did not differentiate on the basis of both the 
external and internal analysis and in both 
validation and cross-validation groups at the 
.01 level, using chi-square analysis. In ad- 
dition, items were eliminated which did not 
show a scale value difference over .5000 using 
the Rundquist-Sletto (3) technique or which 
were clearly a duplication of statements in 
other items. Items were selected so as to 
provide an equal number of positively and 
negatively stated items. 


400 


Table 1 


Distribution of Scores on Final Scale, Government Em- 
ployment Attitudes Scale, for 493 Government 
Workers and 299 Private Employees 


Score Government Private Total 
325-339 3 0 3 
310-324 5 0 5 
295-309 26 1 27 
280-294 75 1 76 
265-279 143 11 154 
250-264 115 22 137 
235-249 68 46 114 
220-234 23 40 63 
205-219 19 64 83 
190-204 11 43 54 
175-189 2 39 41 
160-174 2 24 26 
145-159 0 5 5 
130-144 it 3 4 

N 493 299 692 

Mean 261.9 213.3 243.6 

SD 25.6 29.8 36.0 


Statements such as “scientifc and profes- 
sional groups look down on people who work 
for the government,” “there is no need for 
government to be inefficient,” and “the fact 
of job security would make me want to work 
for the government” failed to yield signifi- 
cant results on all analyses. An item like 
“government workers are as honest as those 
privately employed” brought responses of 
“agree” or “strongly agree” from both groups, 
while both groups disagreed with the idea 
that “the government service is full of Com- 
munists.”” 


The Final Scale 


The Final Scale, Government Employment 
Attitudes Scale, contained 70 items. Exam- 
ples of items were as follows: In a govern- 
ment job, it is hard to make use of one’s own 
ideas; government workers keep trying to do 
a better job; a government job would be all 
right if you couldn’t get another job; good 
college students should be urged to enter 
government service. The responses of 493 
government employees and 299 private em- 
ployees were rescored on the basis of this 

2A copy of the items that comprise the Govern- 
ment Employment Attitudes Scale may be obtained 


from the author, Counseling Center, University of 
California, Berkeley 4, California. 


Barbara P. Aalto 


Final Scale. Table 1 shows the distribution 
of scores for both groups. The average score 
for government workers was 261.9 with a 
standard deviation of 25.6 and for private 
employees, 213.3 with a standard deviation 
of 29.8. The neutral point would be 210. 
The higher the score the more favorable the 
attitudes toward government employment. 
The difference in the attitudes of govern- 
ment and private business employees is of 
considerable magnitude since only about 5% 
of the privately employed were more favor- 
able in attitudes toward government employ- 
ment than the average federal government 
employee. The attitudes of the average pri- 
vate employee would be described as neutral 
rather than negative to government employ- 
ment. It is of interest to note, however, that 
on a more general measure of job satisfaction, 
the Hoppock Job Satisfaction Blank, there 
was no difference in average scores between 
the two groups. The average score for gov- 
ernment employees was 21.4 with a standard 
deviation of 2.6, while the average score for 
private employees was 21.5 with a standard 
deviation of 2.6. A score of 21 is assigned 
a percentile rank of 53 on the norms given 
by Hoppock for 309 adults, 88% of all em- 
ployed adults in New Hope, Pennsylvania, 
1933. Federal government employees do not 
appear to be the dissatisfied, low-morale group 
they are frequently reported to be. At the 


professional-managerial levels, they are as ¢ 


satisfied as their private-business counter- 
parts. 

As shown in Table 2, a reliability coeffi- 
cient of .94 (corrected by the Spearman- 
Brown formula) was found for the govern- 
ment sample and .96 (corrected) for the pri- 
vate group. These results indicate that the 


Table 2 
Reliability of the Final Scale 
Reliability 
Coeffi- Corrected 
cient Coeffi- 
Group N  (odd-even) cient* 
Government sample 493 89 94 
Private sample 299 92 96 


* Corrected by the Spearman-Brown prophecy formula. 


Attitudes Toward Working for the Government 401 
Table 3 
Summary of Validity Data 
Diff. t 
J Between ——_—_—_—__— 
Groups Studied M N: F Mı M: Means Weighted Observed 
Validation groups—satisfied 
gov’t (1) vs. private (2)¢ 125 82 1,9794** 270.1 211.0 59.1 2.6387  14,5494** 
Cross-valid. groups—satisfied 
gov’t (1) vs. private (2)f 124 81 2,2818** 275.6 209.9 65.7 2.6394 18.7302** 
Gov’t work 1st choice (1) vs. 
private employer 1st choice 
(2)t 138 118 2.4443** 273.0 218.4 54.7 2.6063 15.9866** 
Satisfied gov’t (1) workers 
vs. dissatis. gov’t (2) 249 186 1.1139 272.8 248.8 24.0 10,4202** 
Satisfied private (1) vs. dis- 
satisfied private (2) 191 95 1.0790 210.3 220.7 10.4 2,8726** 
Dissatisfied gov’t (1) vs. 
dissatisfied private (2) 186 95 1.4482 248.8 220.7 28,2 8.4653** 


t Significance test used was Cochran-Cox method to test the hypothesis of equality of means with no hypothesis about the 


population variance. 
** Significant at the .01 level. 


scale measures with a high degree of consist- 
ency within each group. 

Several types of evidence indicate that the 
scale has validity and does in fact measure 
attitudes toward government employment. 
These results are summarized in Table 3. 
All differences were significant at the .01 
level. With the exception of the first two, 
these analyses are based on the responses of 
subjects who were partially or wholly not a 
part of the original item analysis and thus 
provide evidence for validity mainly inde- 
pendent of the scale construction groups. 

1. The criterion group of satisfied govern- 
ment workers differed significantly from the 
criterion group of satisfied private employees. 
Furthermore, the differences were of sufficient 
magnitude to have practical guidance and 
selection values, as shown by the fact that 
less than 1% of satisfied private employees 
reached or exceeded the mean of satisfied 
government workers. 

2. The cross-validation groups likewise 
showed statistically significant differences. 
None of the satisfied private employees in 
the cross-validation group reached or ex- 
ceeded the mean of satisfied government 
workers, 

3. Satisfied government workers scored sig- 
nificantly higher than dissatisfied government 


workers. Again, the difference was of con- 
siderable magnitude. 

4, Satisfied private employees scored sig- 
nificantly lower than dissatisfied private 
workers, but the difference was not as great 
as in the preceding comparison, 

5. Dissatisfied government workers scored 
much higher than dissatisfied private em- 
ployees. 

6. Workers who gave “federal government” 
employment first choice on a ranking ques- 
tion scored significantly higher than those 
who ranked “private employer” first choice, 
These differences were also large. 

7. As shown in Table 4, a correlation be- 
tween Government Employment Attitudes 
scores and the Hoppock Job Satisfaction 
Blank in the government group was + .45. 
This positive relationship, however, did not 


Table 4 


Correlation Between Government Employment Atti- 
tudes and Job Satisfaction for Government 
and Private Employees 


Correlation 
Group N Coefficient 
Government 431 45** 
Private 287 —.08 


** Significant at the .01 level. 


402 


hold for the private group. The correlation 
of — .08 was not significant. 

8. The scale items, in addition, appeared 
to have “face validity.” 


Limitation of the Scale 


It appears on the basis of this study that 
the Government Employment Attitudes Scale 
has sufficient reliability and validity to war- 
rant experimental use in counseling, selection, 
and research. Several suggestions for refine- 
ment and improvement could be mentioned 
for future study. These further studies would 
give the personnel worker greater confidence 
in his use of the instrument. The reliability 
and validity should be checked by adminis- 
tration of the Final Scale to a new group of 
subjects to insure that differences between 
satisfied government and private employees 
would hold up under further cross validation. 
Likewise, a check should be made of the re- 
liability of the scale upon repeated adminis- 
tration. 

If the measure is to be used with high school 
and college students, norms for these groups 
should be established. The relationship be- 
tween attitudes toward government employ- 
ment expressed while in school and later job 
satisfaction should be investigated in order 
to measure the predictive significance of the 
scale. 

One of the problems in the use of this scale 
for selection purposes is the possibility of 
faking responses. The items are by no means 
subtle and refer directly to attitudes toward 
government employment. This situation is 
one that does not warrant despair, but should 
be studied. The influence of faking could be 
investigated by administering the scale to an 
experimental group with the usual directions 
followed by directions to fake. A scoring 
technique might be devised that would meas- 
ure the degree of gross faking. 

The Government Employment Attitudes 
Scale was directly constructed to discrimi- 
nate between satisfied government and pri- 
vate employees in professional and manage- 
rial positions. However, the items might 
have high validity and reliability for use with 
workers in other occupational categories. Fur- 
ther study would be needed on the use of the 
scale at these other occupational levels. Like- 


Barbara P. Aalto 


wise, the responses to the items on the scale 
were in terms of federal government employ- 
ment. The hypothesis that the same items 
could be used to measure attitudes toward 
state and local government could be tested. 
The utility of the scale would be enhanced if 
it were possible to extend its use to other oc- 
cupational groups and to other levels of gov- 
ernment employment. 


Summary 


1. The purpose of the study was to con- 
struct a reliable and valid measure of atti- 
tudes toward government employment which 
could be used in counseling, selection, and 
research, 

2. A preliminary scale of 109 items was 
administered to 173 sophomore students in 
introductory laboratory psychology. Items of 
low discrimination value were eliminated. 

3. The standardization was extended to 
493 federal government employees and 299 
employees of private business and industry, 
mainly in professional and managerial occu- 
pations. An item analysis was done on the 
basis of both an internal criterion (top vs. 
bottom 27% in total score) and an external 
criterion (satisfied government workers vs. 
satisfied private employees). A validation 
and cross-validation group were provided for 
each analysis. 

4. The Final Scale consisted of 70 items 
and appeared to have sufficient reliability and 
validity for further experimental use. 


Received March 20, 1956. 


References 


1. Flesch, R. A new readability yardstick. J. appl. 
Psychol., 1948, 32, 221-233. 

2. Hoppock, R. Job satisfaction. New York: Har- 
per, 1935. 

3. Rundquist, E. A. & Sletto, R. F. Personality 
in the depression: a study in the measurement 
of attitudes. Minneapolis: Univer. of Min- 
nesota Press, 1936. 

4. U. S. Employment Service. Occupational classifi- 
cation. Vol. 2. Dictionary of occupational 
titles, Washington: U. S. Government Print- 
ing Office, 1949, 

5. White, L. D. The prestige value of public em- 
ployment. Chicago: Univer. of Chicago Press, 
1929. 

6. White, L. D. Further contributions to the pres- 
tige value of public employment, Chicago: 
Univer. of Chicago Press, 1932. 


Journal of Applied Psycholo, 
Vol. 40, No. 6, 1956) 7 


Evaluation of a Supervisory Training Program with 
How Supervise? 


Richard P. Barthol 
University of California at Los Angeles 


and Martin Zeigler 


The Pennsylvania State University 


The problem of evaluating a training pro- 
gram, particularly training in human rela- 
tions, besets many personnel managers and 
industrial psychologists. How Supervise? has 
been offered as an instrument for this pur- 
pose. If it is so used, we must assume first 
that the test does measure information about 
desirable supervisory practices, and second 
that supervisory practices will be improved 
if the supervisors have learned those prin- 
ciples that represent approved supervisory 
practices. Karn (2) showed that a control 
group was not necessary since no significant 
changes occurred without training, in a short 
period of time. Several investigators (4, 5, 
6, 8) have shown that scores on How Super- 
vise? are related to education or intelligence, 
and furthermore (3) that the reading level of 
the test is at that of the high school graduate. 
Wickert (7) found that Form B was more 
sensitive than Form A as the posttest. 

This study is of the bootstrap variety: if a 
training program is effective and if How Su- 
pervise? is a measure of effectiveness, then a 
posttest on Form B should yield significantly 
higher scores than a pretest on Form A. 
Positive results would support the notion that 
the test has some validity and that the train- 
ing program was effective to some degree. 
Negative results would be inconclusive. 


Methods and Procedure 


The Westinghouse Electric Corporation conducted 
a program of supervisory training of twenty weekly 
meetings, each an hour and a half long. The sub- 
jects were 210 supervisory employees: foremen, gen- 
eral foremen, and department supervisors. They 
Tepresented what are commonly called first- and 
second-line supervisors. Ages ranged from 25 to 60 
years. 

The conference method was used throughout. 
Prior to presenting the program, eighteen leaders 
Were selected from the supervisory group and given 


a one-week intensive leader training course in the 
material to be covered. In presenting the course to 
other supervisors, each leader followed a common 
manual in a carefully prescribed manner. The course 
content included production control, accident pre- 
vention, cost problems, budgets, material control, 
job instructor training, and human relations, No 
attempt was made to cover the material contained 
in How Supervise?, nor did the conference leader 
have access to copies of the test, which was ad- 
ministered by the training director. There is reason 
to believe that the course was not devoted to the 
obtaining of good grades on the measuring instru- 
ment, 

Form A of the test was administered at the be- 
ginning of the program and Form B was adminis- 
tered at the end. The scoring followed the pro- 
cedures outlined in the test manual (1). Additional 
data were collected so that the subjects could be 
subdivided by educational level and previous super- 
visory training. 


Results 


Table 1 indicates that all groups achieved 
significantly higher scores on the posttest. 
The How Supervise? manual does not give 
adequate norms for Level II supervisors, 
which is probably the proper classification 
for this group, but it was estimated that the 
total group started the program at slightly 
below the norm mean and finished at well 
above the mean. The “college” group 
started above the norm mean and finished 
something over one standard deviation above 
the mean.. As might have been expected 
from the earlier studies, the college group 
did significantly better (.001) than the ele- 
mentary or secondary groups. The difference 
between the elementary and secondary groups 
was virtually zero. The most striking educa- 
tional differences, as indicated by the stand- 
ard deviations in Table 1, were the changes 
in the variability. The college group showed 
a remarkable shortening of the distribution: 
the lowest score in the pretest was 24; the 


403 


Richard P. Barthol and Martin Zeigler 


404 
Table 1 P 
Analysis of Scores on How Supervise? Before and After a Training Program x 
Mean SD 
N Pre Post Pre Post id 
Total Group 210 41.64 50.61 10.86 9.33 25.33 
Education* 
Elementary 19 40.53 47.16 10.41 10.35 4.92 
Secondary 96 38.90 47.80 11.16 7.92 16.55 4 
College 60 47.43 57.40 8.37 4,95 14.01 h 
Previous Training Program 
Yes 73 42.73 51.11 10.47 6.53 13.47 
No 137 41.07 50.42 11.06 10.12 21.14 


* 35 subjects were dropped from this classification because their educational level was not reported. 
%* All values are significant beyond the .001 level. Values of # were computed by using difference scores on each subject? 


t = D/Sxp. 


lowest score in the posttest was 43, only four 
points below the means of the other two 
groups. 

This same kind of reduced variability oc- 
curred in the group that had had previous 
training, although not quite so dramatically. 
The means of the two groups, with and with- 
out previous training, were approximately the 
same on both pre- and posttest, and each 
group showed significant improvement. How- 
ever, the standard deviation in the previously 
trained group dropped approximately four 
points while the standard deviation of the 
other group dropped only one point. 

The manual (1) gives an example of im- 
provement resulting from a training program. 
The mean of Level II supervisors was 49.3 
before training and 52.9 after training, a 
change of 3.6 points. This may be compared 
with the present change from 41.64 to 50.61, 
a difference of 8.97 points. The manual does 
not indicate any way of interpreting such dif- 
ferences. 


Discussion 

The results of this study seem to confirm 
the earlier findings cited in this paper that 
How Supervise? is more readily interpreted 
by subjects who have graduated from high 
school. However, there did not appear to be 
any significant differences between subjects 
who had gone only to elementary school and 
those who had gone to high school. Although 
all groups showed significant gains after train- 


ing, the college group was the most promising 
in that almost all of this group were above 
the mean of the norm group after training. 

The large unanswered question is this: do 
these results indicate some kind of superior 
ability (or motivation) of those supervisors — 
who had been to college, or does it mean that 
the test does not adequately measure im- 
provement. in subjects who had not gone be- S 
yond high school? Since a test of this kind 
is of great importance to organizations that 
want to measure the effectiveness of a super- ~ 
visory training program, it is suggested that — 
another study should be made that would 
parcel out such factors as age, seniority, in- 
telligence, and motivation so that we may ~ 
know whether the readability of the instru- 
ment is a primary factor in causing differ- 
ences. Or possibly the suggestion made by ` 
Maloney (3) should be carried out and the 
test revised so that the problem disappears. — 
Additional data should also be collected so 
that there is some way of evaluating, in ab- 
solute terms, a change due to a training pro- 
gram. 


Summary 


A group of supervisors were tested before 
and after a training program with alternate 
forms of How Supervise?. The group was 
subdivided by educational level. Although 
all groups improved significantly, the great- 
est gains were made by supervisors who had 
gone to college. Lower ranking subjects who 


Training Program Evaluation 


had had previous training showed more im- 
provement than the lower ranking subjects 
who had not had previous training, although 
the mean scores of the two groups were the 
same, It was suggested that the instrument 
is useful for assessing the effectiveness of a 
supervisory training program but that more 
work must be done on the readability of the 
test and on the meaning of score changes fol- 
lowing a training program. 


Received December 19, 1955. 


References 


1. File, Q. W., & Remmers, H. H. How Supervise? 
(Revised manual.) New York: Psychological 
Corporation, 1948. 

2. Karn, H. W. Performance on the File-Remmers 
Test, How Supervise?, before and after a 


405 


course in psychology. J, appl. Psychol, 1949, 
33, 534-539. 

3. Maloney, P. W. Reading ease scores for File’s 
How Supervise?. J. appl. Psychol, 1952, 36, 
225-227. 

4. Millard, K. A. Is How Supervise? an intelligence 
test? J. appl. Psychol., 1952, 36, 221-224. 

5. Sartain, A. Q. Relation between scores on cer- 
tain standard tests and supervisory success in 
an aircraft factory. J. appl. Psychol, 1946, 
30, 328-339, 

6. Weitz, J., & Nuckols, R. C. A validation study 
of How Supervise?. J. appl. Psychol., 1953, 
37, 7-8. 

7. Wickert, F. R. How Supervise? Scores before 
and after courses in psychology. J. appl. 
Psychol., 1952, 36, 388-392, 

8. Wickert, F. R. Relation between How Super- 
vise?, intelligence and education for a group 
of supervisory candidates in industry. J, appl. 
Psychol., 1952, 36, 303. 


Journal of Applied Psychology 
Vol. 40, No. 6, 1956 


An Item Analysis of How Supervise? 


Using Both Internal and 


External Criteria 


Robert L. Decker * 


West Virginia University 


The present investigation was designed to 
determine the effectiveness of How Supervise? 
as a measure of supervisory ability in an in- 
dustrial situation. The method involved com- 
puting product-moment correlations between 
total scores on How Supervise? and measures 
of success in supervisory positions. In addi- 
tion, an item analysis which included meas- 
ures of item difficulty, item validity, and in- 
ternal consistency was made to determine 
how the test was functioning with the group 
of subjects under study. 

A review of the literature concerning the 
application of psychological tests in industry 
indicates that How Supervise? is widely ac- 
cepted as a measure of supervisory knowledge 
(1, 2, 9, 10, 11, 13, 16, 17, 18, 21, 26, 27). 
There are many studies indicating changes in 
scores achieved on the test as a result of su- 
pervisory training programs, courses in indus- 
trial psychology, etc. (2, 10, 11, 16, 18, 26). 
How Supervise? has also been used in the 
study of other aspects of industrial behavior 
such as attitudes, interests, etc. (14, 15, 19, 
20, 22, 23, 24, 25, 26). However, the stud- 
ies reported have not yielded sufficient infor- 
mation to justify the use of How Supervise? 
as an aid in selection, placement, or promo- 
tion, three of the most important areas for 
practical application of tests in industry. 
There is a lack of data which would offer 
reasonable support for the belief that success- 
ful supervisory performance can be predicted 
from scores on How Supervise?. 

It is true that in constructing the test the 
authors used a measure of validity (ratings 
of the members of the standardization group 
by their superiors) as well as measures of in- 
ternal consistency as bases for selecting items 
to be used in the final forms (5, 6). How- 

1 The author wishes to express his appreciation to 
Dr. Richard S. Uhrbrock, Associate Director of In- 
dustrial Relations of The Procter and Gamble Com- 


pany, for his kind suggestions and help in the for- 
mulation of the problem and treatment of the data. 


ever, File reports that it was necessary to rely 
mainly on correlation with total score as a 
basis for selecting items because the ratings 
of the members of the standardization group 
proved to be unreliable and of questionable 
validity (5, 6). 

In an article published in 1946 File and 
Remmers reviewed some studies which sug- 
gest that there is a relationship between su- 
pervisory success and scores on How Super- 
vise? (4). In one study scores achieved by 
46 supervisors were compared with those of 
14 nonsupervisors who had been bypassed 
for promotion. The group of supervisors had 
a significantly higher score. Similar results 
were obtained in a comparison of the scores 
of 54 supervisors rated as superior with those 
of 20 supervisors who were rated as inferior. 
Since the groups of subjects were small, File 
did not regard the findings as conclusive. 

A study reported by Sartain in 1946 found 
no significant relationship between supervi- 
sory performance and scores on How Super- 
vise? (21). Forty members of supervision in 
an aircraft factory took a battery of tests 
which included the Experimental Edition 
(Form A) of How Supervise?. The scores 
on How Supervise? were compared with rat- 
ings of each of the supervisors by his immedi- 
ate superior, using correlation techniques, and 
no evidence of relationship was found. 


Procedure 
Subjects 


The subjects of the present study were 208 mem- 
bers of the supervisory staff of a large manufactur- 
ing organization. All were male college graduates 
who were hired as members of supervision during 
the ten-year period prior to the study. All subjects 
were either at the first or second level of supervision, 
ie. their rank was the equivalent of either Foreman 
or Supervisor, at the time of the study. Most of 
the men were hired directly after graduation from 
college and all were selected on similar standards. 
All subjects had participated in on-the-job supervi- 
sory training programs. 


406 


Item Analysis of How Supervise? 


Administration of the Test 


The test used was Form M of How Supervise?, 
the form recommended by its authors for use with 
office or higher level supervisors (3). The test is 
composed of 100 items dealing with problems, prac- 
tices, and opinions related to industrial supervision. 
The test is divided into three sections, namely, Su- 
pervisory Practices, Company Policies, and Super- 
visor Opinions. In the first section the subject re- 
sponds to each of a list of 20 statements concerning 
Supervisory Practices as desirable, undesirable or 
uncertain. In a second section he responds to 32 
statements of Company Policies as desirable, unde- 
sirable, or uncertain. In a third section composed 
of 48 statements of Supervisor Opinion the subject 
is asked to express agreement, disagreement, or un- 
certainty with each opinion. The score on the test 
by the standard scoring procedure is the total num- 
ber right minus the total number wrong. The an- 
swer “uncertain” is not scored. 

The test was administered to the subjects in 
groups of from 3 to 15. The men were told that 
an evaluation study was being made of the test and 
that their cooperation was needed to “test the test.” 
They were also told that the results of the test 
would remain confidential. Participation in the 
study was voluntary. The instructions recom- 
mended by the authors of the test were followed 
in the administration of the forms (3). 


The External Criterion 


Each subject was rated on a rating scale of super- 
visory performance which has been developed by 
King and Wingert and which is published by Indus- 
trial Psychology, Inc. (12). The rating scale con- 
tains 60 questions or statements about the ratee’s 
job efficiency, The statements cover such perform- 
ance areas as quantity, quality, job knowledge, per- 
sonal work habits, potential for further develop- 
ment, etc. The rater responds to each question or 
statement by checking “Yes or True” or “Not True 
at Present.” The scale is so arranged that one-half 
of the favorable responses require the rater to check 
“Yes or True” while for the other half the favor- 
able response is “Not True at Present.” Each sub- 
ject was rated by his immediate superior. The state- 
ments in the ratings scale were weighted according 
to their D values and phi values from the group on 
which the test was standardized. For the purposes 
of the present study the total weighted raw score on 
the rating scale was used as the measure of success 
as a supervisor, i.e., the external criterion. 

As an additional check upon the acceptability of 
the rating scale as a measuring instrument a split- 
half reliability study was performed. Scores on the 
first half of the rating scale were compared with 
those for the second half. Application of Spearman- 
Brown techniques resulted in a corrected reliability 
Coefficient of .898 (8). This was considered an ac- 
ceptable reliability by the present investigator. 


407 


The Statistical Analyses 


The data from the study were punched into IBM 
cards to facilitate analysis. The following statistical, 
analyses were performed. 

1. A product-moment coefficient of correlation was- 
computed between the total score by the standard 
scoring method, i.e., rights minus wrongs, on How: 
Supervise? (Form M) and supervisory performance: 
as measured by the total raw score on the rating: 
scale, 

2. D values (fraction of the group failing) were 
computed for each item.2 

3. A biserial coefficient of correlation was com- 
puted between each item and the total number right 
on How Supervise?, The total number right was 
used instead of the rights-minus-wrongs scoring for- 
mula suggested by the authors of the test to avoid 
any variations which might result from the scoring 
formula. 

4. A biserial coefficient of correlation was com- 
puted between each item and the measure of super- 
visory success, i.e., the raw score on the rating scale, 
For both 3 and 4 above the items were scored as 
correct for the right answer or as incorrect for the 
wrong answer, a response of “uncertain,” or no re- 
sponse. The general formula suggested by Guilford 
was used in calculating the biserial rs (8). Biserial 
correlations were used instead of point biserials or 
phi coefficients because the purpose of the analysis 
was to determine the degree of correlation between 
the quality measured by the item and that measured 
by the total raw score on the test and by the rating 
of success (8). 

5. The test records for all subjects were rescored 
on the basis of the 25 items of the test which were 
found to correlate significantly with the criterion, 
A Pearson product-moment coefficient of correlation 
was then computed between the total number right 
on these 25 items and the criterion scores, 

6. A percentile table of scores on How Supervise? 
was constructed for the 208 subjects, 


Results 


The scores on How Supervise? obtained by 
the 208 subjects ranged from 19 to 100 and 
averaged 79.61 with an SD of 10.04. The 
Taw scores on the rating scale for supervisory 
performance ranged from 17 to 107 and av- 
eraged 71.14 with an SD of 21.27. Both dis- 
tributions were tested for normality in terms 
of skewness and kurtosis (7). Neither the 
distribution of scores on How Supervise? 
nor the distribution of scores on the rating 
scale was significantly skewed. However, 


2 This may be considered an inversion of the fre- 
quently used procedure of reporting item difficulty 
in terms of the percentage of the group answering 
the item correctly. 


408 


both distributions showed marked tendencies 
to be platykurtic. A visual inspection of the 
scatter diagram gave no indication of the 
presence of a curvilinear relationship in either 
case. 

The computation of a Pearson product- 
moment coefficient of correlation between the 
total score on How Supervise? and the total 
raw score on the scale for rating supervisory 
performance yielded an r of .108. On the 
basis of this value of 7 with the present data 
the rejection of the null hypothesis is not 
justifiable. The significance of the obtained 
r is well below the .181 which would be re- 
quired for the .01 level of confidence (7). 

Item difficulties or D values as measured 
by the fraction of the group failing the item 
ranged from .01 to .64 with the median at 
.10. The total number right on How Super- 


Table 1 
Results of the Item Analysis of How Supervise? 
(Form M) * 

Item Item Internal Item ' 
No, Difficulty** Consistency Validity 
10 O01 84 58 
12 Al AT 27 
14 AL 48 27 
15 05 53 :21 
16 01 -60 .30 
34 39 Al .22 
36 429, 35 25, 
37 09 43 —.26 
54 03 .67 32 
55 06 57 .23 
56 02 44 30 
57 02 .80 22 
58 05 65 31 
59 01 61 70 
68 01 40 67 
80 01 50 25 
83 «13 58 25 
85 02 7A 27 
87 04 46 27 
88 02 G 1 21 
90 04 50 37 
92 01 35 24 
96 01 60 30 
99 06 48 30 

100 01 21 38 


* All coefficients are significant at the .01 level of confidence. 
#* Item difficulty is reported in terms of the fraction of the 
group failing the item. 


Robert L. Decker 


Table 2 


Percentile Table for 208 Industrial Supervisors 
on How Supervise? (Form M) 


Test Test 
Percentile Score Percentile Score 
99 100 40 78 
98 98 37 77 
97 97 33 76 
96 96 30 75 
95 95 26 74 
94 94 22 73 
92 93 20 72 
90 92 18 71 
88 91 16 70 
86 90 14 69 
84 89 12 68 
79 88 10 67 
74 87 8 66 
67 86 7 64 
66 85 6 63-62 
64 84 5 61-60 
62 83 4 59-58 
58 82 3 56 
55 81 2 55-54 
50 80 1 52- 0 
45 79 


vise? ranged from 60 to 100 and averaged 
85.65 with an SD of 7.2. The internal con- 
sistency measures (biserial rs between indi- 
vidual items and total number right on How 
Supervise?) ranged from .00 to .84 with the 
median at .40. Item validities as measured 
by the biserial rs between the individual items 
and the criterion measures ranged from — .26 
to .70, with the median obtained coefficient 
being .07.8 

Twenty-five items were found to have va- 
lidity coefficients significant at the .01 level 
of confidence. The item numbers along with 
their indices of difficulty, internal consist- 
ency, and validity are presented in Table 1. 
A test composed of these items would have a 
median difficulty of .04, a median internal 
_ *The complete results of the item analysis show- 
ing measures of difficulty, internal consistency, and 
validity have been filed with the American Docu- 
mentation Institute. Order Document No. 5044 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 


to Chief, Photoduplication Service, Library of Con- 
gress, 


Item Analysis of How Supervise? 


consistency measure of .50, and a median va- 
lidity coefficient of .27. When the test rec- 
ords were rescored for these 25 items and the 
total number right compared with the cri- 
terion score, the r was .35 which is well above 
the .18 that would be required for signifi- 
cance at the .01 level of confidence. 

The percentile table for total score on How 
Supervise? based on the 208 subjects of this 
study is presented in Table 2. 


Discussion 

The results of the computation of r be- 
tween scores on How Supervise? and success 
in a supervisory position indicate no appar- 
ent relationship. Hence, the use of this test 
in selecting, promoting, or placing members 
of supervision under circumstances similar to 
those encountered in this study is definitely 
not warranted. A review of the validity co- 
efficients for the individual items lends fur- 
ther support to this conclusion. 

Although the correlations between the 
items which were found to be selective in 
terms of the criterion were generally low, a 
review of the statements contained in these 
items might suggest directions which future 
research should follow in dealing with the 
prediction of supervisory success. The one 
item which had a significant negative correla- 
tion was item 37. The supervisors who were 
rated as successful did not feel that “Requir- 
ing supervisors to submit in writing their rea- 
sons for firing or penalizing any employee” 
was a desirable practice. Being familiar with 
the policies and practices of the company 
whose employees served as subjects, the au- 
thor feels certain that the subjects interpreted 
the statement as applying only to penalities 
and not to firing since the termination of an 
employee would not take place without thor- 
ough investigation and documentation. The 
successful supervisor perhaps felt that being 
required to make a statement in writing re- 
garding any penalty, however slight, might 
be a step toward depriving him of responsi- 
bility and independence in the operation of 
his department. 

The statistical analyses of the data indi- 
cated the following tendencies among the 
supervisors who were rated less successful. 


409 


They did not feel that the following super- 
visory practices were undesirable,* 

1. “Using production records alone to de- 
termine which worker to recommend for pro- 
motion” (item 10). 

2. “Prohibiting conversation between work- 
ers on routine jobs” (14). 

3. “Making an example of one worker to 
prevent further trouble with others” (12). 

4. “Putting a loud individual in his place 
with a sarcastic remark” (15). 

5. “Selecting supervisors according to how 
much they know about the different jobs they 
will supervise” (34). 

6. “Fining employees for violations of 
rules” (36). . 

In addition, they did not feel that “Ex- 
plaining to workers who submit nonusable 
suggestions why their ideas can not be put to 
use” was a desirable practice (16). Further, 
the less successful supervisors tended to either 
agree with or be uncertain about the follow- 
ing statements. 

1. “So-called mental fatigue is actually 
nothing but laziness” (54). 

2. “Most employees do better work if they 
get a good bawling out every so often” (55). 

3. “The only guarantee of good work is a 
fat pay envelope” (56). 

4, “Praising workers for good work only 
leads to demands for more pay” (57). 

5. “The average worker cares little about 
what others think of his job so long as the 
pay is good” (58). 

6. “The worker’s opinion of his supervisor 
is not very important” (59). 

7. “The only important requirement of a 
good supervisor is a complete understanding 
of the jobs he is to supervise” (68). 

8. “The nature of the supervisor’s job 
makes it necessary for him to be unpopular 
with his workers” (80). 

9. “The best way to handle tough workers 
is to be tougher than they are” (83). 

10. “The average supervisor can do noth- 
ing to reduce absenteeism” (85). 

11. “Constant demands on the time of top 
executives make it impractical for them to 

4The correct response to each of these items was 
“Undesirable.” The less successful supervisors tended 


to respond to the practice as “Desirable” or to be 
“uncertain” about it. 


410 


spend any time in actual conversation with 
workers” (87). 

12. “Lectures are usually better than con- 
ferences for getting ideas across to workers” 
(88). 

13. “You can tell when a person is lying 
by noting whether or not he looks you in the 
eye” (90). 

14. “About half of the workers in our 
company are just naturally stubborn and un- 
cooperative” (92). 

15. “Supervisors should be completely re- 
lieved from duties concerning production plan- 
ning and materials handling” (96). 

16. “Rapid learners are usually quick for- 
getters” (99). 

17. “The goals of management and labor 
are directly opposed and must always be in 
conflict with each other” (100). 

The types of statements which are selective 
would seem to indicate that future research 
might well be concerned primarily with the 
elimination of individuals whose prospects of 
becoming successful supervisors are poor. The 
profile of an individual who would respond to 
the statements in the way just outlined sug- 
gests that the temperament of an individual 
may be a more important factor in his per- 
formance as a supervisor than his supervi- 
sory knowledge. The possibility that the re- 
sponses given by the unsuccessful supervisors 
tend to characterize an authoritarian person- 
ality structure should receive further investi- 
gation. 

If it is desirable to use the test in its pres- 
ent form with subjects similar to those of the 
present study, the answers might be scored 
on the basis of the 25 items listed above. 
The statistical analysis suggests that a total 
score based upon these 25 items has some 
value as a predictor of supervisory success. 
Further validation of this conclusion with 
other subjects would be desirable, however. 

The results of the present study indicate 
that the items are not sufficiently difficult to 
make Form M of How Supervise? a satisfac- 
tory test to be used with subjects having the 
backgrounds of those in this investigation. 
The most difficult questions, items 45 and 
46, were failed by 62% and 64% of the 
group, respectively. Practically all the other 


Robert L. Decker 


items were consistently passed by a large ma- 
jority of the group as indicated by the aver- 
age item difficulty of .14. All the subjects 
of the present study were college graduates. 
The question of whether or not How Super- 
vise? could be used in selection, placement, 
upgrading, etc., of non-college graduates still 
remains unanswered. Studies in this area 
should include some control of intelligence 
since reports made by Millard (14) or Wick- 
ert (27) suggest that How Supervise? may 
be a test of intelligence for individuals below 
certain educational levels. 

The biserial rs between total number right 
on How Supervise? and each item are mostly 
low but positive and, except for four items, 
significant at the .01 level of confidence. 
The general indication is that the items are 
consistently measuring some quality. It seems 
conceivable, in the light of the face validity 
of the items, that this quality could be su- 
pervisory knowledge. If this were true then 
the results of the present study might be 
taken as an indication that supervisory knowl- 
edge of the range and level measured by How 
Supervise? is not a factor in achieving suc- 
cess as a supervisor. 

The percentile table in Table 2 is pre- 
sented as a standard of performance of col- 
lege graduates engaged in supervisory posi- 
tions on How Supervise?. It is interesting to 
note that the median of this group falls at a 
score of 80 while the median score for “top 
management supervisors” in the standardiza- 
tion group used by the authors of the test is 
72 (3). 


Summary and Conclusions 


Two hundred and eight college graduates: 
who were members of the supervisory staff 
of a large manufacturing organization took 
Form M of How Supervise? and were rated 
for supervisory performance on the rating’ 
scale devised and published by King and 
Wingert (12). Statistical analysis indicated’ 
no relation between scores on How Supervise? 
and rated success in a supervisory position. 
An item analysis indicated that the items con- 
sistently measured some quality, possibly su- 
pervisory knowledge. The items in the test 
were found to be too easy for the group of 


: 


i 


we 


; 


Item Analysis of How Supervise? 


subjects and for the most part not valid pre- 
dictors of supervisory success as measured 
under the conditions of the present study. 
Test records for the subjects were rescored 
on the basis of the 25 items which had sig- 
nificant coefficients of validity. The r be- 
tween total number right on these items and 
the criterion was found to be .35. 

The following conclusions were drawn: 

1. Under the conditions of the present 
study, scores on How Supervise? do not pre- 
dict success in a supervisory position. 

2. The items in Form M of How Super- 
vise? are not sufficiently difficult for college 
graduates having the backgrounds of the sub- 
jects of the present study. 

3. Form M of How Supervise? is consist- 
ently measuring some quality, possibly super- 
visory knowledge. 


Received January 23, 1956. 


References 


1. Belman, H. S., & Evans, R. N. Selection of stu- 
dents for a trade and industrial education 
curriculum. J. educ. Psychol., 1951, 42, 52- 
58. 

2. Canter, R. R., Jr. A human relations training 
program. J. appl. Psychol., 1951, 35, 421-425. 

3. File, Q. Wọ, & Remmers, H. H. Manual jor 
How Supervise?, (Rev. Ed.) New York: 
Psychological Corporation, 1948. 

4. File, Q. W., & Remmers, H. H, Studies in su- 
pervisory evaluation. J. appl. Psychol., 1946, 
30, 421-425. 

5. File, Q. W. The measurement of supervisory 
quality in industry. Unpublished doctor’s 
dissertation, Purdue Univer., 1944. 

6. File, Q. W. The measurement of supervisory 
quality in industry. J. appl. Psychol, 1945, 
29, 323-337. 

7. Garrett, H. E. Statistics in psychology and 
education, New York: Longmans, Green, 
1948. Pp. 220, 299, 347-352, 

8. Guilford, J. P. Fundamental statistics in psy- 
chology and education. New York: McGraw- 
Hill, 1950. Pp. 209, 324, 492, 499. 

9. Jurgensen, C. E. Foreman training based on 
the test How Supervise?. Personnel J., 1949, 
28, 123-127. 

10. Karn, H. W. Performance on the File-Remmers 
test, How Supervise?. J. appl. Psychol., 1949, 
33, 534-539, 


we 


411 


11. Katzell, R. A. Testing a training program in 
human relations. Personnel Psychol, 1948, 
1-2, 319-329. 

12. King, J. E., & Wingert, Judith W. Merit rat- 
ing series—performance-supervisor. Chicago: 
Industrial Psychology, Inc., 1953. 

13. Millard, K. A. A personnel study of supervi- 
sors in business and industry. Unpublished 
doctor’s dissertation, Univer. of Minnesota, 
1947. 

14. Millard, K. A. Is How Supervise? an intelli- 
gence test? J. appl. Psychol., 1952, 36, 221- 
224. 

15. Miller, F., & Remmers, H. H. Studies in indus- 
trial empathy: II. Management’s attitude to- 
wards industrial supervisors and their esti- 
mates of labor’s attitude. Personnel Psychol, 
1950, 3, 33-40. 

16. Mosel, J. N., & Tsacnaris, H. J. Evaluating the 
supervisory training program. J. Personn. 
Adm. industr. Relat., 1954, 1, 99-104, 

17. Mosier, C, I. Review of How Supervise?, In 
O. K. Buros (Ed.), The third mental meas- 
urements year-book. New Brunswick: Rut- 
gers Univer. Press, 1949. Pp. 727-728. 

18. Pond, Bette B. Performance on File-Remmers 
How Supervise? test before and after super- 
visory training. Unpublished master’s thesis, 
Pennsylvania State College, 1951, 

19. Remmers, H. H., Remmers, L, & Miller, F. 
A quantitative study of reciprocal empathy 
of labor leaders and industrial management, 
Amer. Psychologist, 1949, 4, 282-283. (Ab- 
stract) 

20. Remmers, Lois J., & Remmers, H. H. Studies in 
industrial empathy: I. Labor leaders’ attitudes 
toward industrial supervision and their esti- 
mates of managements attitudes. Personnel 
Psychol., 1949, 2, 427-436, 

21. Sartain, A. Q. Relation between scores on cer- 
tain standard tests and supervisory success in 
an aircraft factory. J. appl. Psychol., 1946, 
30, 328-339, 

22. Slocombe, C. S. Appraisal of Mr. File’s study. 
Personnel. J., 1946, 24, 251-254. 

23. Speroff, B. J. Relationship between empathic 
ability and supervisory knowledge. J. Per- 
sonn. Adm. industr. Relat., 1954, 1, 195-197, 

24. Van Zelst, R. H. Empathy test scores of union 
leaders. J. appl. Psychol., 1952, 36, 293-295. 

25. Whyte, W. H., Jr. The fallacies of personality 
testing. Fortune, 1954, 50, No. 3 (Sept.). 

26. Wickert, F. R. How Supervise? scores before 
and after a course in psychology. J. appl. 
Psychol., 1952, 36, 388-392. 

27. Wickert, F. R. Relationship between How Su- 
pervise?, intelligence and education for a 
group of supervisory candidates in industry, 
J. appl. Psychol., 1952, 36, 301-303. 


Journal oj Applied Psychology 
Vol. 40, No, 6, 1956 


Preference Measurement by the Methods of Successive Intervals 
and Monetary Estimates 


Purnell H. Benson 
Drew University 
and John H. Platten, Jr. 
J. A. Ward, Inc. 


The methods of successive intervals and 
paired comparisons have been proposed to 
measure product preferences in applying the 
marginal preference model to consumer be- 
havior (1). These methods satisfy the rule 
of addition that the sum of preference differ- 
ences AB and BC equals AC for any three 
points along a linear qualitative continuum 
(2). 

This rule may not be a sufficient condition 
for the method of measurement used to es- 
tablish the relationship of equal marginal 
preferences in applying the model. For large 
ranges in preference the magnitude of the 
error in expressing preferences between quali- 
tative points may change. The error in choos- 
ing between a $101 article and a $102 ar- 
ticle is apparently greater than the error in 
choosing between a $1 and a $2 article. In 
using measurement methods based upon a 
changing judgmental error, the relationship 
of equal marginal preferences will not hold 
accurately unless the range of preference 
variation included in the measurements is 
relatively limited. 

Furthermore, it may be noted that, in 
using the methods of successive intervals or 
paired comparisons, the rule of addition is 
satisfied by the measurement of preferences 
for single articles, but not by the measure- 
ment of preferences for combinations of ar- 
ticles presented simultaneously as stimuli for 
choice. If the preference difference between 
a $101 article and a combination of a $101 
article and a $1 article is measured, the con- 
sumer can separate the combination in for- 
mulating his judgments. The judgmental 
error is apparently not then the same as if 
a $101 and a $102 article are compared. 
Methods of preference measurement based 
upon this error as a unit satisfy the rule of 


addition for unilinear series, but not for 
multilinear series in which there are alterna- 
tive routes of judgment in proceeding from 
one point to another in the hierarchy of 
preferences. 

A method of measuring preference in which 
the metric unit does not change for objects 
of different monetary value is provided by 
asking the individual to estimate how much 
he would be willing to pay to secure the ob- 
ject of his preference. Such a method of 
measurement possesses little novelty, but 
seems useful in supplying the corrective fac- 
tor for adjusting the judgmental error unit 
used in the methods of successive intervals 
and paired comparisons. 

This paper examines the relationship be- 
tween preferences as measured by the method 
of successive intervals and preferences as 
measured by monetary estimate. If the re- 
lationship is linear, this means that the judg- 
mental error unit remains constant relative 
to changes in monetary value. If the rela- 
tionship is nonlinear, the required correction 
factor can be defined for changes in the 
measuring unit when preferences for cheap 
and expensive articles are compared. 


Description of the Data 


A questionnaire containing two parts was 
administered to 102 individuals of middle 
socioeconomic background who were inter- 
viewed outside of neighborhood supermar- 
kets. The first portion of the questionnaire 
read to respondents was: 


I have in my hand a list of brand-new articles of 
equal retail value. Imagine that you are at some 
big affair where these articles are being given away 
free as door prizes. They are not to be re-sold by 
the people who win them. As I read each one to 
you, tell me how much you would like to win it. 


412 


j 
f 
’ 


Preference Measurement 


The ten articles on the list are: 


A $50 Rug 

A $50 Radio 

A $50 TV set 

A $50 Camera 

A $50 Bicycle 

A $50 Portable typewriter 
A $50 Set of china 

A $50 Easy chair 

A $50 Vacuum cleaner 

A $50 Dress 


The preference categories to be checked are: 


Like extremely 
Like very much 
Like somewhat 
Like very little 


In the second portion of the questionnaire, 
the instruction to respondents was: 


Now, let’s imagine you are at an auction sale, and 
these same brand-new articles are being auctioned 
off to the highest bidder. For each one of them tell 
me what is the most money you would be likely to 
bid for it- Once again remember that none of them 
could be re-sold. You would have to be getting 
them for yourself or your family only. 


The monetary values selected and the cor- 
responding categories checked are tabulated 
_ in Table 1. The ten articles and the 102 re- 
spondents provide 1020 observations, six of 
which were removed because of “don’t know” 
replies. Class intervals were selected to pro- 
vide an even distribution of monetary values 


413 


SUCCESSIVE INTERVAL SCALE VALUE 


50 


20 30 40 
MONETARY ESTIMATE IN DOLLARS 
Fic. 1. Relationship of preference-scale values ob- 


tained by the method of successive intervals and 
preference-scale values given by monetary estimates. 


and a complete matrix for computation of 
scale values by the method of successive in- 
tervals. 

An exponential function of the type Y = 
aZ” +c was used to fit the relationship be- 
tween scale values and monetary estimates, 
The constants for this were found by calcu- 
lating the coefficient of correlation for various 
values of ġ and deriving the maximum. The 
equation obtained is 


Y = 428 Z5, u] 


where Z is the monetary estimate and Y is 
the preference measured by the method of 


Table 1 
Scale Values Obtained by the Method of Successive Intervals Compared with Monetary Estimates 


Monetary Estimate 


Category of Preference 


Class Class Like Like Like Like Scale 
Interval Mean Extremely VeryMuch Somewhat Very Little Value* 
$0 $0 13 28 30 343 09 
$1-5 $ 4.52 10 9 13 74 54 
$ 6-10 $ 9.85 37 26 25 49 1.34 
$11-15 $14.58 31 23 19 17 1,70 
$16-20 $19.49 40 25 6 7 2.23 
$21-25 $24.86 36 28 10 7 2.13 
$26-30 $29.79 25 13 3 2 2,48 
$31-50 $40.60 37 17 9 2 245 


* Origin for scale values was selected to eliminate one of the constants in the functional relationship fitted to scale values and 


monetary estimates. 


414 


successive intervals. The origin for the scale 
values of Y was chosen after analysis to elimi- 
nate the c constant. The curve and the 
points upon which it is based are given in 
Fig. 1. 


Discussion and Conclusion 


The principle that consumers make pur- 
chases at those points on their buying con- 
tinuum where marginal preferences are equal 
assumes that preferences measured for arti- 
cles of different costs are based upon the 
same sized measuring unit. When the method 
of successive categories or other method based 
upon a judgmental error unit is applied over 
a range in preference variation which is not 
small in extent, a correction is needed. This 
correction can be supplied by means of the 


formula 
Y'= y/o, [2] 


in which case the relationship between Y’, 
the corrected preference measurement, and Z 
the monetary value, becomes linear. 

The value found for ò and the type of func- 
tion utilized here are to be regarded tenta- 
tively until more complete studies have been 
made. The value for the linear constant a 
apparently depends upon the amount of in- 
come which the consumer has for disposal. 
Research is needed to establish the values 
for @ appropriate to different income levels 
and demands upon family resources. 

Alternatively, the monetary estimate may 


, 


Purnell H. Benson and John H. Platten, Jr. 


be used as a method of measuring preference 
in the application of the marginal preference 
model to consumer behavior. Since mone- 
tary amounts are uniformly additive, no cor: 
rection is apparently needed from this stand- 
point. Estimates made by the consumer 
usually require qualification if they are taken 
as indicative of actual buying behavior. j 

It remains to be disclosed whether mone- 
tary estimates provide as precise preference 
measurements as the values obtained by the 
methods of successive categories or paired 
comparisons. In general, those questionnaire 
data which require the respondent to make 
complex rather than simple judgments are 
less precise, and for this reason data from 
choices rather than money estimates may 
provide the more precise measurements, with 
such studies as the one reported here giving 
the corrective adjustment. The size of the 
correlation coefficient found here, .968, sug- 
gests that comparable results can be obtained 
by either method for measuring consumer 
preferences. A 


Received April 25, 1956. 


References 


1. Benson, P. H. A model for the analysis of con- 
sumer preference and an exploratory test. J. < 
appl. Psychol, 1955, 39, 375-381. 4 
2. Gulliksen, H. O. Paired comparisons and the 
logic of measurement. Psychol. Rev., 1946, 
53, 199-213. 


lournal of Applied Psycholo; 
fe 40, No. 6, 1956 cid 


W, In an earlier paper (3) a procedure was sug- 
i pA gested for achieving some control over the 
| number of independent observations which 
might be required to obtain statistical signifi- 
sance in studies using a chi-square test in two- 
elled tables. The suggestion was based on 
the fact that, in such cases, the conventional 
pression for computing chi square reduces to 
fo—fe)?/N. By substituting the value of chi 
uare required at the desired level of signifi- 
cance, the relationship between N and fo~ fe 
was defined and could be described graphically. 
_ Thus the significance of obtained differences 
could be obtained quickly by visual inspection 
T of the graph. It was shown that the use of 
this graph would enable experimenters to plan 
k ta collection in phases of 10, 20, or 30 Ss at 


The Relationship Between Chi Square and Size of Sample: 
the General Case 


Herbert D. Kimmel 


Human Factors Research, Inc., Los Angeles 


the difference between the observed frequencies 
and those expected by chance were great 
enough to exceed the desired level of signifi- 
cance. In this way the size of NV could be 
kept reasonably close to the minimum required 
by the actual difference in the population rather 
than fixed arbitrarily on a priori grounds. 

The present paper extends the logic under- 
lying this procedure to the general chi-square 
situation, regardless of the number of cells in 
the table. It should be noted, in making the 
transition from the special to the general case, 
that the fortuitous disappearance of the sum- 
mation operator in the special case does not 
occur in the general case. 

The conventional expression, 


mended that a direct method of computing P be used. 


* No values have been given for situations in which the theoretical cell frequencies are less than 5, 
It should also be noted that the values given in this table have not 
corrected for discontinuity in the case of small theoretical frequencies. According to Guilford, the correction should be 
ied in cases involving theoretical frequencies of 25 or less but only in two-celled tables (2, p. 279). 


415 


time. After each period of data collection, mares (fo = fi)? mi 
t could be determined from the graph whether te i 
Table 1 
Df? Required for Significance at .05 and .01 Levels for Several Ws and ns* 
(.01 level in boldface) 
Number of Observations (N) 
40 50 60 70 80 90 100 
613.21 933.18 1319.82 1773.12 2293.09 2879.73 3533.03 
656.13 986.83 1384.20 1848.23 2378.93 2976.30 3640.30 
478.15 722.69 1017.22 1361.76 1756.30 2200.84 2695.38 
513.41 766.76 1070.12 1423.47 1826.82 2280.17 2783.50 
395.90 594.88 833.86 1112.83 1431.81 1718.78 2189.76 
426.22 632.75 879.32 1165.88 1492.43 1858.99 2265.54 
340.47 508.92 710.70 945.82 1214.27 1516.05 1851.17 
367.24 542.35 750.86 992.67 1267.81 1576.29 1918.10 
300.52 447.08 622.22 825.92 1058.19 1319.04 1608.45 
324.64 477.23 658.39 868.12 1106.42 1373.30 1668.74 
270.34 400.42 555.50 735.59 940.67 1170.75 1425,84 
292.38 427.97 588.56 774.16 984.75 1220.34 1480.94 
363.93 503.38 665.05 848.95 1055.07 1283.41 
389.33 533.93 700.70 889.69 1100.90 1335.44 
334.60 461.51 608.43 775.35 962.27 1169.19 
358.33 490.00 641.66 813.23 1004.99 1216.66 


In these situations it is 


416 


has been shown by Cramér (1, p. 417) to be 
identical with 


L= ori [2] 


in which V = number of observations and n = 
number of cells. Solving for Z fè, this be- 
comes, 


a i 


For any particular situation, once an accept- 
able level of significance has been decided upon, 
the left-hand expression in Equation 3 can be 
specified for any value of V. A table or graph 
may be made before data collection begins, 
describing the relationship between Xf and 
sample size (V) at the chosen level of signifi- 
cance. Then it is only necessary to square the 
observed frequencies and compare the sum of 
these squares with the value required for sig- 
nificance for that size of sample. The data 
collection could be terminated when this value 
was attained. | ` 

For example, suppose a preference experi- 


1 Notation changed; Cramér’s p; is constant.’ 


Herbert D. Kimmel 


ment required subjects to choose the most p. 
ferred of four stimuli. A chance hypoth 
would predict W/4 preferences for each stim 
lus. Assuming that the .01 levelof significan 
were required for rejection of this chance h 
pothesis, x? with 3 df would have to equal 
exceed 11.341. Substituting these values it’ 
equation 3, 


_ 11.341. +N 
= 


A 
It can be seen that the minimum necessa: 
value of $ fẹ to obtain significance can 


from 3- to 10-celled tables. 
Rectived A pril 14, 1950. $ » 


References 


1. Cramér, H. Mi al methods of statistic 
Princeton: Princeton Univer. Press, 1951. 

2. Guilford, J. P. Fundamental statistics in psycholo, 
and education, New York: McGraw-Hill, 19: 

3. Kimmel, H. D. The relationship between chi squat 
and size of sample in two-celled tables. J. ap, 

œ Psychol., 1956, 40, 61-62. 


t 


